10,000 Matching Annotations
  1. Sep 2025
    1. Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community.

      Weaknesses:

      (1) A comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.<br /> (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growth-advantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear.<br /> (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

    2. Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tateishi et al. report a Tn-seq-based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Thank you for reviewing our manuscript and finding the significance of our data.

      Weaknesses:

      The paper lacks clarity in data presentation and organization. For example, some of the key data on cfu counts of clinical Mi strains in a mouse model can be presented along with the Tn-seq dataset in Figure 6, the visualization of which can be improved with volcano plots. etc. Improvement in data visualization is perhaps necessary throughout the paper.

      Thank you for the comment on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref #22) . Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication (new Ref #22) on the virulence of MAC-PD strains used in this study for mice in the revised manuscript (page12, line 212).

      The data of CFU counts were shown in new Supplementary Fig. 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively, except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (page 12, lines 212-216; new Supplementary Fig, 3b, new Supplementary Table 8)”.

      Regarding the suggestion to include volcano plots, we appreciate the proposal but chose not to adopt this format, as the main aim of this study was to identify genes commonly required for in vitro and in vivo fitness across multiple M. intracellulare strains, rather than to highlight differential genetic requirements within a single strain. Volcano plots are useful for visualizing differential values and significance for a single dataset but are less suited for cross-strain comparisons of shared gene sets. Our approach is aligned with the methodology used by Cary et al. (PLoS Pathog. 2018; new Ref#8), who similarly focused on identifying conserved genetic requirements across M. tuberculosis genotypes without employing volcano plots.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is not well-supported by the data presented in Figure 7.

      Thank you for the comments on the difference of adaptation for hypoxic growth between ATCC13950 and clinical MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of culture for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      The title of the paper is misleading as the study doesn't provide any mechanistic aspect of hypoxic adaptation in Mi.

      Thank you for the comment on the article title. We admit that this paper does not directly reveal the mechanism of hypoxic adaptation in M. intracellulare strains but provides the data on the different pattern of hypoxic adaptation between M. intracellulare strains in relation to the difference of genetic requirements. Therefore, we revised the title as ”Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare

      Reviewer #2 (Public Review):

      Summary:

      In the study titled "Functional genomics reveals the mechanism of hypoxic adaptation in nontuberculous mycobacteria" by Tateishi et al., the authors have used TnSeq to identify the common essential and growth-defect-associated genes that represent the genomic diversity of clinical M. intracellulare strains in comparison to the reference type strain. By estimating the frequency of Tn insertion, the authors speculate that genes involved in gluconeogenesis, the type VII secretion system, and cysteine desulfurase are relatively critical in the clinical MAC-PD strains than in the type strain, both for the extracellular survival and in a mouse lung infection model.

      Based on their analysis, the authors proposed to identify the mechanism of hypoxic adaptation in nontuberculous mycobacteria (NTM) which offer promising drug targets in the strains causing clinical Mycobacterium avium-intracellulare complex pulmonary disease (MAC-PD).

      Strengths:

      A major strength of the manuscript is the performance of the exhaustive set of TnSeq experiments with multiple strains of M. intracellulare during in vitro growth and animal infection.

      Thank you for reviewing our manuscript and acknowledging the performance of producing datasets in this study.

      Weaknesses:

      (1) The study suffers from the authors' preconceived bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950.

      Thank you for the comment regarding a potential bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways.

      In this study, we found that several of these pellicle-associated genes also showed increased genetic requirement in the clinical MAC-PD strains, suggesting a possible overlap in hypoxic adaptation mechanisms. We did not insist that clinical MAC-PD strains showed an increase of genetic requirements in all genes of hypoxic pellicle formation. Except for the gene sets involved in hypoxic pellicle formation in ATCC13950, almost no global information has been revealed on the pathogenesis of nontuberculous mycobacterial disease, which differs from the case of tuberculosis. Along with this finding, we investigated the effect of gene silencing on bacterial growth and preferential hypoxic adaptation observed by growth kinetics in clinical MAC-PD strains compared to ATCC13950. At first glance, to focus on the gene sets of hypoxic pellicle formation seems to be “biased”, but we proceeded this research step by step based on our achievements. We consider these data provide valuable information on the pathogenesis of MAC-PD by clinical MAC-PD strains.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).

      (2) An important set of data with the ATCC13950 reference strain is missing in the mouse infection study. In the absence of this, it is difficult to establish whether the identified genes are critical for infection/intracellular proliferation, specifically in the clinical isolates that are relatively more adapted for hypoxia.

      Thank you for the comment on the necessity of setting ATCC13950 as a control strain of mouse TnSeq experiment. To set ATCC13950 as a control strain in mouse infection experiments would be ideal. However, we proved that ATCC13950 is eliminated within 4 weeks of infection (Tateishi Y. BMC Microbiol. 2023; new Ref#22). That means, it is impossible to perform in vivo TnSeq study due to the inability to harvest sufficient number of colonies.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3) Statistical enrichment analysis of gene sets by GSEA wrongly involves genes required for hypoxic pellicle formation in ATCC13950 together with the gene sets found essential in the clinical MAC-PD strains, to claim that a significant % of genes belong to hypoxia-adaptation pathways. It could be factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains (and may not be related to hypoxia).

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing increased genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and Week4 to Week16 of infection) and 40% (70 and 79 genes out of 179 through Day 1 to Week 16 and Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and thorough Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      These data indicate that about 40-50% genes required for in vitro hypoxic pellicle formation are shared with the genes required for in vivo bacterial growth, and that about 40% strain-dependent/accessory essential genes are shared with the genes required for in vivo bacterial growth. Thus, the genes required for the growth of M.i.27 and M.i.198 in mouse lungs are enriched individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth. We have added the description of the reanalyzed data of GSEA in the manuscript (pages 16-17, lines 287-290). And the details of reanalyzed data of GSEA have been shown in Supplementary Fig. 5 and 6 as well as Supplementary Tables 15 and 16.

      (4) Validation of mouse infection experiments with individual mutants is missing.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL. Microbiol Immunol 68, 339-347 (2024).

      (5) Phenotypes with TnSeq and CRISPRi-based KD exhibit poor correlation with misleading justifications by the authors.

      Thank you for the comment on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect-associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results among universal essential genes such as glcB, inhA, gyrB and embB. Although the mechanism has not been fully proven yet, we consider that such inconsistent phenotypes with TnSeq and CRISPR- based knockdown may be related to the recently revealed bypass mechanism of gene essentiality which is characteristically observed in strain-specific/accessory essential genes (Rosconi F. Nat Micorbiol. 2022; new Ref#14). They suggested this bypass mechanism of gene essentiality in strain-dependent/accessory essential or growth-defect-associated genes from the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains. For example, knockout mutants are successfully recovered from transformation experiments targeting strain-specific/accessory essential genes in TnSeq such as cytidine monophosphate kinase cmk, formate tetrahydrofolate ligase fhs and farnesyl-diphosphate synthase fpp. The bypassing of gene essentiality can be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes fulfill the following three categories: i) high levels of conservation within and often across species, iii) limited genetic diversity, and iii) high and stable expression levels. Consequently, the universal essential genes are rigid, largely immutable key components to an organism’s survival. In the universal essential genes, the knockout recovery fails as shown by no colonies or only appearance of merodiploids. Taking into consideration such bypass mechanism of gene essentiality in strain-dependent/accessory essential genes, the lower effect of gene silencing of strain-dependent/accessory essential genes on bacterial growth may reflect pathway rewiring that helps the bacterial growth under suppression of the target gene expression.

      We have added the description of the possible reason for inconsistency between TnSeq and CRISPR-i results in the Result and Discussion in the revised manuscript (page 21, lines 367-376; pages 28-29, lines 489-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      In summary, this study is unable to provide mechanistic insights into why and how different MAC-PD mutant strains exhibit differential survival (in vitro and in animals) and adaptation to hypoxia. It remains to understand why the clinical strains show better adaptation to hypoxia and what is the impact of other stresses on their growth rates.

      Thank you for the comments on the issue of being unable to prove the mechanism of MAC-PD pathogenesis and adaptation to hypoxia. We admit that the original manuscript did not provide the apparent reason and mechanism of MAC-PD pathogenesis and adaptation to hypoxia. Following the comment, we have modified the manuscript tile as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

      However, we revealed the diversity of genetic requirements among the genus M. intracellulare including the type strain ATCC13950 and clinical MAC-PD strains. We revealed the characteristics of genetic requirements in clinical MAC-PD strains as increased genetic requirements of gluconeogenesis, type VII secretion system and cysteine desulfurase, the former two of which are also required in hypoxic pellicle formation in ATCC13950. Along with this, we demonstrated the difference of growth behavior under hypoxia between clinical MAC-PD strains and ATCC13950. Overall, we consider that we could provide the basic information suggesting the involvement of difference of genetic requirements among strains in the pathogenesis of MAC-PD.

      Reviewer #3 (Public Review):

      Summary:

      The study by Tateishi et al. utilized TnSeq in nine genetically diverse M. intracellulare strains, identifying 131 common essential and growth-defect-associated genes across those strains, which could serve as potential drug targets. The authors also provided an overview of the differences in gene essentiality required for hypoxic growth between the reference strain and the clinical strains. Furthermore, they validated the universal and accessory/strain-dependent essential genes by knocking down their expression using CRISPRi technique. Overall, this study offers a comprehensive assessment of gene requirements in different clinical strains of M. intracellular.

      Thank you for reviewing our manuscript and finding the significance of our data.

      (1) The rationale for using ATCC13950 versus clinical strains needs to be clarified. The reference strain ATCC13950 was obtained from the abdominal lymph node of a patient around 10 years ago and is therefore considered a clinical strain that has undergone passages in vitro. How many mutations have accumulated during these in vitro passages? Are these mutations significant enough to cause the behavior of ATCC13950 to differ from other recently sampled clinical strains? From the phylogenetic tree, ATCC13950 is located between M018 and M.i.27. Did the authors observe a similarity in gene essentiality between ATCC13950 and its neighbor strains? What is the key feature that separates ATCC13950 from these clinical strains? The authors should provide a strong rationale for how to interpret the results of this comparison in a clinical or biological context.

      Thank you for the comments on the rationale for using ATCC13950 versus clinical strains and the key feature that separates ATCC13950 from clinical MAC-PD strains.

      ATCC13950 was isolated in 1949, (not around 10 years ago) from 34-month-old female abdominal lymph node (Cuttino. Am J Pathol 1949; new Ref#11). Of note, the clinical background of the patient infected with ATCC13950 is quite different from the patients with MAC-pulmonary disease (MAC-PD), the incidence rate of which has been increasing worldwide without predisposing immunological disorders. ATCC13950 has been regarded as a type strain of genus M. intracellulare historically. And ATCC13950 is the first M. intracellulare strain to be sequenced in 2012 (Kim. J Bacteriol 2012; new Ref#59).

      The rationale for using ATCC13950 versus clinical MAC-PD strains is to find the answer to the question whether the essential genes and genetic requirements are similar or different between clinical MAC-PD strains and historical type strain ATCC13950. So far, there are two reports on TnSeq that compare genetic requirements between clinical mycobacterial strains and the type strains, one of which is M. tuberculosis (Carey AF. PLoS Pathogens. 2018; new Ref#8) and the other is M. abscessus (Akusobi C. mBio. 2025; new Ref#9, published after submission of our manuscript). They reported the difference and diversity of genetic requirements between clinical strain and type strains such as M. tuberculosis H37Rv and M. abscessus ATCC19977. We have added the mention of these previous reports to explain the rationale for setting the type strain ATCC13950 as a referential control strain. (page 5, lines 83-87)

      The genetic and functional analysis of clinical MAC-PD strains has not been conducted for a long time. In 2021, we have revealed the genomic diversity between clinical MAC-PD and ATCC13950 by comparative genomic analysis (Tateishi BMC Microbiol. 2021; new Ref#5). Except for our TnSeq study of ATCC13950 (Tateishi Sci Rep 2020; new Ref#10), no functional analysis has been conducted in clinical M. intracellulare strains. On our research stream of clinical MAC-PD strains, we expected that we could reveal the functional genomic characteristics of clinical MAC-PD strains by setting ATCC13950 as a referential control strain for analyzing TnSeq data.

      It seems an interesting viewpoint to consider the relationship between accumulation of mutations by in vitro passages during prolonged periods from first isolation in ATCC13950, and the difference of phenotypes between ATCC13950 and recently sampled clinical MAC-PD strains. However, there are no time-course samples of ATCC13950 isolates available. Therefore, we can neither investigate how many mutations have accumulated in a time-course manner, nor evaluate how much the accumulated mutations influence the phenotype in ATCC13950. It can be expected that the accumulation of in vitro mutations may cause the behavior of ATCC13950 different from clinical MAC-PD strains. However, it is to be elucidated yet which kinds of factors contribute to the characteristics of ATCC13950 that differ from clinical MAC-PD strains specifically.

      It seems an interesting viewpoint to investigate the similarity of gene essentiality between genetical neighbor strains. However, we focused on the overview of the profiles of gene essentiality in clinical MAC-PD strains compared to ATCC13950. Thus, it was out of scope to elucidate the details of gene essentiality in each genetic phylogeny that clinical MAC-PD strains belong. The overview of phylogenetic trees should be referred to our previous publication on the comparative genomic analysis of 55 strains (Tateishi Y. BMC Microbiol. 2021; new Ref#5, new Supplementary Fig. 1), and we have shown Fig. 1 as the extracted phylogenetic tree of subject strains. To elucidate the details of gene essentiality in each genetic clade, it would be necessary to include a considerable number of strains that we used for comparative genomic analysis in 2021 (Tateishi Y. BMC Microbiol. 2021; new Ref#5). Furthermore, it would be necessary to set a referential control strain other than ATCC13950 for comparing gene essentiality. So far, it is not the highest priority for us to elucidate the similarity of gene essentiality between phylogenetic clades in detail, and such investigation will be planned as a future study.

      The key features that separate ATCC13950 and clinical MAC-PD strains have not been proved yet, in contrast to the case of M. tuberculosis such as mutations in the gene of the response regulator PhoPR in the type strain H37Rv vs most clinical strains. However, the features that separate ATCC13950 and clinical MAC-PD strains may not be explained by a single genetic factor but may be explained by complicated factors such as epigenetic and/or regulatory factors. For example, the reason for the weakened virulence of H37Ra compared to H37Rv has not been able to be explained by simple genetic differences (Brosch R. Infect Immun. 1999).

      In summary, we set the historical type strain ATCC13950 which is derived from infant abdominal lymphadenitis as a referential control strain for TnSeq analysis, because we intended to reveal the characteristics of clinical MAC-PD strains in terms of the gene essentiality and genetic requirements by comparing the clinical MAC-PD strains with the non-MAC-PD reference strain. We consider that the profiles of gene essentiality and genetic requirements specific to clinical MAC-PD strains confer the pathogenesis in an increasing number of MAC-PD patients worldwide without predisposing immunological disorders.

      [References]

      Cuttino, J.T. & Mc, C.A. Pure granulomatous nocardiosis, a new fungus disease distinguished by intracellular parasitism; a description of a new disease in man due to a hitherto undescribed organism, Nocardia intracellularis, n. sp., including a study of the biologic and pathogenic properties of this species. Am J Pathol 25, 1-47 (1949).

      Kim, B.J. et al. Complete genome sequence of Mycobacterium intracellulare clinical strain MOTT-64, belonging to the INT1 genotype. J Bacteriol 194, 3268 (2012).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al.. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020)

      Brosch R. et al. Genomic analysis reveals variation between Mycobacterium tuberculosis H37Rv and the attenuated M. tuberculosis H37Ra strain. Infect Immun. 67, 5768-74 (1999).

      (2) Regarding the 'nine representative strains of M. intracellulare with diverse genotypes in this study,' how were these nine strains selected? To what extent do they represent the genetic diversity of the M. intracellulare population? A phylogenetic tree illustrating the global genetic diversity of the M. intracellulare population, with these strains marked on it, would be important to demonstrate their genetic representativeness.

      Thank you for the comments on the selection of 9 subject strains. We selected the 9 strains based on the phylogenetic tree we published in 2021 (BMC Microbiol 2021; new Ref#5). We have shown the global phylogenetic tree of the M. intracellulare population in new supplementary Fig. 1. We have selected 4 or 5 strains from the major two groups (typical M. intracellulare group and M. paraintracellulare-M. indicus pranii group) for this TnSeq study, respectively.

      [Reference]

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      (3) The authors observed a considerable amount of differential gene requirements in clinical strains. However, the genetic underpinning underlying the differential requirement of genes in clinical strains was not investigated or discussed. Because M. intracellulare has a huge number of accessory genes, the authors should at least check whether the differential requirement could be explained by the existence of a second copy of functional analogous genes or duplications.

      Thank you for the comments on the effect of gene duplication on the change of genetic requirements between strains. Following the comments, we conducted blast search for the 162 genes showing increased Tn insertion reads in each subject strain. We found that M019 has duplicate genes of OCU_RS44705 coding adenosylhomocysteinase (LOCUS_42940: ahcY_1, LOCUS_21000: ahcY_2). However, there were no duplicate genes found in the remaining 161 genes showing increased Tn insertion reads.

      From these results, we consider that gene duplication has minor effects on the change of genetic requirements between strains. Rather, sequence differences and accessory genes may play a key role in determining the difference of genetic requirements.

      We have added a description of the above-mentioned result in the Result section (pages11-12, lines 191-199).

      (4) Growth in aerobic and hypoxic conditions: The authors concluded that clinical strains are better adapted to hypoxia, as reflected by their earlier entry into the log phase. They presented the 'Time at midpoint' and 'Growth rate at midpoint.' However, after reviewing the growth curves, I noticed that ATCC13950 had a longer lag phase compared to other strains under hypoxic conditions, and its phylogenetic neighbor M018 also had a longer lag phase. Hence, I do not believe a conclusion can be drawn that clinical strains are better adapted to hypoxia, as this behavior could be specific to a particular clade. It's also possible that the ATCC13950 strain has adapted to aerobic growth. I would suggest that the authors include growth curves in the main figures. The difference in 'Time at midpoint' could be attributed to several factors, and visualizing the growth curves would provide additional context and clarity.

      Thank you for the comments on the possibility of genotypes as a determinant of growth pattern in M. intracelulare. Following the comments, we performed aerobic and hypoxic growth assay in the two strains (M005 and M016) that neighbor ATCC13950.

      Author response image 1.

      The phylogenetic relationship between M005, M016 and ATCC13950. The former two strains are squared in blue.

      M005 reached midpoint later than ATCC13950 both in aerobic and hypoxic conditions. By contrast, M016 reached midpoint three quarters earlier than ATCC13950 under hypoxic conditions. The growth rate was not significantly different between M005, M016 or ATCC13950 under either aerobic or hypoxic conditions, although P-value of M005 vs ATCC13950 was 0.0512 under aerobic conditions on Steel’s multiple comparisons test.

      From the data of growth pattern in M005 and M016, we suggest that in addition to gene essentiality, genotypes may have some impact on the bacterial growth pattern under hypoxia; however, since there was a significant difference in the timing of hypoxic adaptation among ATCC13950 and its neighbor strains, bacterial growth pattern under hypoxia is considered to be determined by multiple factors such as genetic requirements and unproven regulatory systems. Taking into consideration that there are lots of genetically diverse strains other than ATCC13950 clade, many clinical MAC-PD strains are possibly better adapted to hypoxia.

      Responding to the reviewer’s recommendation, we have added the description of the above-mentioned result in the revised manuscript (page 18, lines 313-322). And we have shown the data of growth curves of the original 9 subject strains in the new Fig 7. And we have added the data of the growth curves of M005 and M016 in new Supplementary Fig 7. Additionally, we have corrected the label of y-axis in new Fig. 7a and new Supplementary Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Figure legends. (page 58, lines 1027-1028 and Supplementary Fig. 7a legend)

      (5) Lack of statistical statement: The authors emphasized the role of pellicle-formation-associated genes in strain-dependent essential and accessory essential genes. Additionally, the authors observed that 10% of the genes required for mouse infection are also required for hypoxic pellicle formation. However, these are merely descriptive statements. There is no enrichment analysis to justify whether pellicle-formation-associated genes are significantly enriched in these groups.

      Thank you for the comments on the enrichment of pellicle-formation associated genes in strain-dependent essential and accessory essential genes. We performed GSEA and found that 9.1% (16 out of 175) genes were hit as core enrichment. Of them, 4 genes were hit commonly as genes showing increased genetic requirements analyzed by resampling plus HMM analyses including genes of phosphoenolpyruvate carboxykinase pckA (OCU_RS48660), type VII secretion-associated serine protease mycP5 (OCU_RS38275), type VII secretion protein eccC5 (OCU_RS38345) and glycine cleavage system amino-methyltransferase gcvT (OCU_RS35955).

      Author response image 2.

      We have added the description of GSEA result in the revised manuscript (page 8, lines 137-144; Supplementary Fig. 2; Supplementary Table 5).

      Reviewer #1 (Recommendations For The Authors):

      Tn-seq and hypoxia adaption in clinical isolates of M. intracellulare (Mi): The authors claim that clinical strains are better adapted to hypoxia because their genetic requirements for optimum fitness overlap with genetic requirements for fitness of the type strain under hypoxia. This is a reasonable hypothesis, but it has not been well-supported by the data presented in Figure 7. The growth rates (Figure 7b) of most of the clinical strains under hypoxia appear to be less than the type strain, although they all seem to grow better than the type strain under normoxia. Perhaps a continuous growth curve of each strain, both as pure and mixed cultures under these conditions will provide a clearer picture.

      Thank you for the comments on the difference of adaptation of hypoxic growth between ATCC13950 and MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of cultures for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model, and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      In vivo studies: It is unclear how virulent the two clinical strains, Mi27 and Mi198 are in the mouse model. The CFU data in Figure S1b reports the bacterial burden of the Tn libraries of the two strains, of which the overall population of Mi27 library seems to be declining. Without any information on the CFU, animal survival, and tissue pathology from the pure strains, data from the library will have limited implications.

      Thank you for the comments on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref#22). Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication on the virulence of MAC-PD pure strains used in this study for mice in the revised manuscript (page 12, line 212; new Ref #22).

      The data of CFU counts were shown in new Supplementary Figure 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively (Tateishi Y. BMC Microbiol. 2023; new Ref#22), except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (new Supplementary Fig, 3b, new Supplementary Table 8)”. The decline of overall population of M.i.27 Tn mutant library strains in the infected lungs can be explained by the lower virulence of M.i.27 pure strain that shows intermediate virulence phenotype than M.i.198 that shows high virulence phenotype.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Data presentation: The manuscript suffers from a lack of clarity in data visualization and presentation, especially the Tn-Seq datasets. Panels describe the experimental workflow with a densely-worded paragraph, making it difficult to navigate through the major findings.

      Thank you for the comments on the issue of Fig. 1b. Following the suggestion, we have modified the new Fig. 1b entitled “Strategy of the procedure of TnSeq analyses”.

      Language: The paper should be extensively revised for language. Often the authors have mixed-up the terms like 'core' and 'accessory' 'genes' in lines 116-119 with 'core and accessory genomes' in Figure 2c, which is not even mentioned in the paper. It is further unclear how they identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of Mi. Line 251: ..."involved by confer..." needs revision. The terms "increased gene essentiality" and 'growth-defected associated genes" are very confusing. The essentiality of a gene is either absolute or conditional but is not quantitative. Similarly, 'growth-defect associated' can be replaced with a better phrase that alludes to fitness loss in the clone. Additional typos were found throughout the paper that need to be fixed.

      Thank you for the comments on the issue of scientific words including “'core and accessory genomes” and “gene essentiality” used in this study.

      Based on Rosconi’s paper (Panel C of Fig. 1 in Nat Microbiol. 2022; new Ref#14), we used the phrases “accessory genome and core genome” as a meaning of a whole set of genes belonging to accessory and core genes. To avoid the confusion and keep consistency, we replaced the term “genomes” to “genes” in the revised manuscript.

      In our previous comparative genomic analysis, we identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of M. intracellulare (Tateishi Y. BMC Microbiol. 2021; new Ref #5). To perform pangenomic analysis, we used the software Bacterial Pan-Genome Analysis tool (BGAP) (Narendrakumar NM. Sci Rep 2016).

      We admit that gene essentiality is a qualitative but not a quantitative trait. We have corrected the term "increased gene essentiality" as "increased genetic requirements" throughout the manuscript.

      We have used the term “growth-defect (GD)” based on the classification of gene essentiality calculated by the Hidden Markov Model (HMM) complied by TRANSIT software (DeJesus. PLoS Comput Biol. 2015; new Ref#12). The HMM classifies genes as essential (ES), GD, non-essential (NE), growth-advantage (GA). GD means difficulties of growth (growth deficiency) in aerobic conditions in vitro, because Tn insertions are less frequent. The suggested phrases “fitness loss” or “less fit” may include the meaning of the comparison of two different conditions such as culture conditions exposed to a single bacterial strain. Since the HMM analysis is performed in data of a single strain in a specific bacterial condition, we consider that the phrase including “fitness” is somewhat unsuitable for describing the classification of gene essentiality. Thus, it is difficult for us to rephrase GD to the word that implies fitness levels between two conditions in a single bacterial strain.

      [References]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Narendrakumar NM et al. BPGA- an ultra-fast pan-genome analysis pipeline. Sci Rep 2016 6, 24373 (2016).

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      (1) Result 1 (Page 6-7): Common essential and growth-defect-associated genes representing the genomic diversity of M. intracellulare strains.

      (1a) From Table S1, it is observed that the numbers of Tn-inserted TA sites significantly vary (p >0.05) among biological replicates for each strain when compared with the reference strain ATCC13950. the authors should provide an explanation of how they overcame this variation in their analysis.

      Thank you for the comment on the issue of a variable number of Tn-inserted TA sites among biological replicates for each strain of MAC-PD.

      On TRANSIT software, we set the replicate option as Sum to combine read counts. And we used Beta-Geometric correction (BGC) to normalize the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ.

      Following the comment, we have added the description on which option we used for handling the replicate data and normalization (page 36, lines 640-643).

      (1b) Importantly, saturation in most of the strains is only ~50-60%. In such a case, there will be a high probability that Tn will not hit a nonessential region due to chance instead of selection (See DeJasus et al., mBio, 2017). It has been observed that the sequence pattern (GC)GNTANC(GC) is strongly associated with non-permissiveness. As shown earlier, the authors need to carefully look for the potential non-permissive sites before concluding the fate of a gene. Also, they should acknowledge the potential limitations of their approach due to the suboptimal level of saturation.

      Thank you for the comment on the saturation of Tn mutant libraries. Our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey. PLoS Pathog 2018; new Ref#8). Our method is also based on the coauthor’s TnSeq study on H37Rv (Minato Y. mSystems 2019; new Ref#61). Moreover, by combining replicates, the saturation of our Tn mutant libraries became 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq anlaysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) are used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to that reported by DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq method of identifying essential genes and detecting the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 is acceptable.

      As the Reviewer indicates, there is non-permissive sequence pattern in mariner transposon mutagenesis. Using more than 10 replicates of Tn mutant libraries is quite an accurate method for detecting essential genes in nonstructural small genes such as small regulatory RNAs. However, as DeJesus shows, the number essential genes identified by TnSeq are comparable in large genes possessing more than 10 TA sites between 2 and 14 TnSeq datasets, most of which seem to be structural genes (Supplementary Fig 2 in mBio 2017; new Ref#57). Thus, we do not consider that we made a serious mistake for the classification of essentiality in most of the structural genes that encode proteins. With respect to the coverage of non-permissive sites, our TnSeq method might need to be improved if it is intended to classify the gene essentiality quite accurately on the small genes including small regulatory RNAs.

      We investigated the non-permissive TA sites in ATCC13950. There are 4136 (6.43% of total ORFs) nonpermissive TA sites in ATCC13950, which is less than in H37Rv (9% of total ORFs) (DeJesus MA. mBio 20171; new Ref#57) and in M. abscessus ATCC19977 (8.1% of total ORFs)(Rifat D. mBio. 2021; new Ref#58). As for larger ORFs (TA sites > = 10), there are nonpermissive TA sites in 89 genes (ORFs) of common “essential (ES)” or “growth-defect-associated (GD)” (4.82% of a total of 1844 larger ORFs in ATCC13950). As for small ORFs (2-9 TA sites), there are nonpermissive TA sites in 41 genes (ORFs) of common ES or GD (1.35% of a total of 3021 smaller ORFs in ATCC13950).

      We appreciate the idea of concluding the fate of gene essentiality by the presence/absence of non-permissive TA sites. However, we cannot conclude the fate of gene essentiality classification only by the presence/absence of potential non-permissive sites. Because, strictly to say, it is impossible to conclude the scientific truth of gene essentiality without functional analysis using gene manipulation. In accurate, TnSeq can “predict” the gene essentiality but cannot perfectly guarantee the functional significance. However, in the current situation, most of the recent TnSeq studies have been published only by the TnSeq analysis without functional analysis that uses gene manipulation strains of all targets they identified. Taking such limitations of TnSeq including non-permissive sites into consideration, we consider that the essentiality of the detected genes should be determined in further studies, mainly including biological experiments such as functional studies using gene manipulation strains.

      We have added the above-mentioned contents in the revised manuscript (pages 32-33, lines 559-580).

      [References]

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Minato, Y., et al. Genomewide assessment of Mycobacterium tuberculosis conditionally essential metabolic pathways. mSystems. 4, e00070-192019 (2019).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Rifat, D., Chen L., Kreiswirth, B.N. & Nuermberger, E.L.. Genome-wide essentiality analysis of Mycobacterium abscessus by saturated transposon mutagenesis and deep sequencing. mBio 12, e0104921 (2021).

      (1c) Line 100: Authors report a total of 131 genes identified as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. It should be explained in more detail how gene essentiality was determined (see above comment in (1b)). Furthermore, in Table S3 authors should mention the essential and growth defective trait of each of the 131 genes.

      Thank you for the comment on how to classify the 131 genes as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. As replied in (1b), the average saturation of Tn insertion of our libraries became 62-79% when combining duplicate or triplicate data in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq analysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) were used for HMM and resampling analyses, and most of triplicate libraries ranges 70-79% saturation (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to those with DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq libraries are acceptable for identifying essential genes and growth-defect-associated genes by the HMM method.

      We used the HMM method as reported by DeJesus (DeJesus. PLoS Comput Biol. 2015; new Ref#12). HMM method can categorize the gene essentiality throughout the genome including “Essential”, “Growth-defect”, “Non-essential” and “Growth-advantage”. “Essential” genes are defined as no insertions in all or most of their TA sites. “Non-essential” genes are defined as regions that have usual read counts. “Growth-defect” genes are defined as regions that have unusually low read counts. “Growth-advantage” genes are defined as regions that have unusually high low read counts.

      Following the previous report (Carey AF. PLos Pathog 2018; new Ref#8), the annotation for the clinical MAC-PD strains was adapted from that of ATCC13950 by adjusting the START and END coordinates of each ORF in the clinical MAC-PD strains according to their alignment with the corresponding ORFs of ATCC13950. By using an adjusted annotation table, gene essentiality was classified by the HMM analysis.

      We have added the explanation of how we identified essential and growth-defect-associated genes in the Methods (pages 35-36, lines 620-632). And following the comment, we have added the data of classification of gene essentiality in the 131 genes in the new Supplementary Table 3 in the revised manuscript.

      [Reference]

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      (1d) In Table S4, the authors show strain-specific putative essential genes from the core and accessory gene sets. For the sake of clarity, it is important to have the name of all the strains against each gene in which it is predicted essential or growth defective.

      Thank you for the comment on the hit strains on the genes classified as strain-specific and accessory putative essential of growth-defect associated. Following the comment, we have added the data of hit strains in the new Supplementary Table 4 in the revised manuscript.

      (1e) Lines 123-126: It is not clear what is the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. These appear to be randomly distributed across different clinical isolates and is not clear whether they correlate with differential susceptibility of the reference strain and clinical isolates to hypoxia.

      Thank you for the comment on the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways. We indeed found that the genetic requirements of several hypoxic pellicle genes were increased in clinical MAC-PD strains in vitro situations. These data suggest the hypoxic pellicle genes become more important in clinical MAC-PD strains for in vitro growth than in ATCC13950.

      Moreover, hypoxia is known to be one of the characteristic conditions in vivo including clinical lesions (McKeown. Br Br J Radiol. 2014). We consider it reasonable to expect that the strains derived from MAC-PD patients without predisposing immunological disorders may adapt under hypoxic conditions for maintaining bacterial survival. Therefore, we highlighted the genes involved in hypoxic pellicle formation in ATCC13950.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).<br /> [Reference]

      McKeown, S.R., et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87,: 20130676 (2014).

      (2) Result 2 (pages 8-10): Genes with increased gene essentiality in clinical MAC-PD strains are also required for hypoxic pellicle formation in the type strain.

      (2a) As reported by authors (lines 123-126), only a small fraction of genes showing essentiality in clinical MAC-PD strains are required for hypoxic pellicle formation in the reference strain, which might be due to random distribution. Authors should avoid making such a generalised statement that reflects the association of the entire essential gene pool in clinical MAC-PD strains with hypoxic pellicle formation.

      Thank you for the comment on the issue of a small fraction of genes showing increased genetic requirements in clinical MAC-PD strains that is shared with genes required for hypoxic pellicle formation in the type strain ATCC13950. We admit that the section title may mislead that the genes required for hypoxic pellicle formation confer the entire essential gene pool of clinical MAC-PD strains. Following the comment, we have revised the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950” (page 9, lines 146-147).

      We consider that it cannot be explained by a mere coincidence that we obtained the data of partial overlap of genes showing essentiality in clinical MAC-PD strains with genes required for hypoxic pellicle formation in ATCC13950, because we demonstrated the supporting data such as the pattern of genetic requirements suggesting gluconeogenic metabolic shift (Fig. 5) and the different pattern of hypoxic growth curves between clinical MAC-PD strains and ATCC13950 (Fig. 7).

      (2b) I fail to understand how the number of Tn insertions determines "more" or "less" essentiality of a gene particularly with 50-60% saturation. To my understanding, essentiality is a qualitative trait. Either a gene will be essential (based on no Tn insertion despite having the permissive sites), critical (poor representation of Tn insertions at the permissive sites due to growth defect of the strain in the pool), non-essential (expected frequency of insertion) or growth-advantageous (higher representation of Tn insertions at the permissive sites due to growth advantage of the strain in the pool). Hence, authors should avoid quantifying the essentiality of a gene.

      Thank you for the comments on the trait of gene essentiality. We realize that essentiality is a qualitative trait, not a quantitative trait. Taking into consideration the number of Tn insertions determines "more" or "less" requirements of a gene, we have corrected the manuscript by using the phrase “genetic requirements” instead of “gene essentiality”.

      As mentioned earlier, our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey AF. PLoS Pathog 2018; new Ref#8). Moreover, as described in rebuttal (1b), the saturation of our Tn mutant libraries by combining replicates are 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the recent TnSeq analysis by Akusobi where 52-80% saturation libraries (“high-density” transposon libraries) were used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi C. mBio. 2025; new Ref#9).

      Thus, we consider that our data of the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 are acceptable.

      [Reference]

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      (2c) From Figures 3-4, it seems the authors intend to highlight the insertion frequencies of certain genes in the clinical isolates compared to those in the reference strain to conclude whether a gene has become more critical and its disruption results in the growth defective phenotype (poor representation) in the clinical isolates, or a critical/essential gene has become dispensable in these strains.

      Based on these arguments, I suggest that the authors modify the title of the result such as "Tn insertion reveals differential requirement of genes for in vitro growth of clinical MAC-PD strains" or "Identification of genes differentially required for in vitro growth of clinical MAC-PD strains" as this is precisely the information we gain from this section of the study. Also, it is suggested to re-draft the rationale of this section as only 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. Hypoxia-related genes can be highlighted in the next section (see below).

      Thank you for the suggestion to modify the section title and to re-draft the rationale of the section. Following the comment, we modified the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950 (page 9, lines 146-147)

      Following the suggestion, we have revised the rationale of this section as follows: “The sharing of strain-dependent and accessory essential or growth-defect-associated genes with genes required for hypoxic pellicle formation in ATCC13950 prompts us to consider that the profiles of gene essentiality in clinical MAC-PD strains may be associated with the genes required for hypoxic pellicle formation in ATCC13950.” (page 9, lines 151-155)

      The reviewer points out that only 4 genes associated with hypoxic pellicle formation were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. However, to discuss how much proportion of the genes were detected to be increasingly required in clinical MAC-PD strains compared to ATCC13950, we should focus on the 121 genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950, excluding the 162 genes indispensable for clinical MAC-PD strains. Thus, we described that 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of the 121 genes having significantly fewer Tn insertions than ATCC13950 in the manuscript (Fig. 3).

      (3) Result 3 (Page 10-14): Requirement of genes with increased gene essentiality in the clinical MAC-PD strains for mouse lung infection.

      (3a) The title should be modified to "Identification of genes in the clinical MAC-PD strains required for mouse lung infection".

      Following the comment, we have modified the section title as "Identification of genes in the clinical MAC-PD strains required for mouse lung infection". (page 12, lines 201-202).

      (3b) Further, the rationale of this experiment needs to be modified. As mentioned above, up until now the impact of hypoxic pellicle formation genes in the growth of MAC-PD strains remains unconvincing. The rationale of mouse infection experiments could be straightforward- "to identify genes critical for animal infection of the clinical isolates".

      Thank you for the comment on the rationale of the in vivo TnSeq experiment. Following the comment, we have revised the rationale as “The impact of hypoxia on mycobacteria under various ecological circumstances implies that the genes required for pathogenesis of MAC-PD may be in some degrees, overlapped with the genes with increased requirements in the clinical MAC-PD strains compared to ATCC13950, and also with the genes required for hypoxic pellicle formation in ATCC13950. To identify genes required for in vivo infection of clinical MAC-PD strains,” in the revised manuscript (page 12, lines 204-210).

      (3c) The authors should avoid using the term "genes with increased essentiality" for the reasons explained above in point #2b.

      Following the comment, we have corrected the term as “genes with increased requirements” in the revised manuscript (page 12, line 207).

      (3d) From Tables S8 and S9, I can find 93 genes in Mi198Tn and 74 genes in Mi27Tn for which Tn insertion mutants are under-represented in TnSeq at all time points from Day 1 to Wk 16 in comparison to input. Importantly, excluding results from Day 1 when the infection has just settled, I find 172 and 121 genes in Mi198Tn and Mi27Tn, respectively, under-represented in lungs between Wk 4-16. My suggestion is that authors should focus more on such genes and identify the characteristics of these genes and what fraction belongs to those involved in hypoxic pellicle formation in the reference strain. I am perplexed why authors have categorically ignored other genes and only focused on a set of genes that correspond to ~10-12% of entire differentially abundant mutant pool.

      Thank you for the suggestion on the genes that Tn insertion mutants are under-represented in TnSeq from Weeks 4-16 in the infected mouse lungs be analyzed for overlapping the genes involved in hypoxic pellicle formation in the type strain ATCC13950. We found that at all timepoints from Day1 to Week 16, 74 genes and 99 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (28.3%) and 12 (12.1%) genes belonged to the genes involved in the genes required for hypoxic pellicle formation in the type strain. We found that at timepoints from Week 4 to Week 16, 121 genes and 172 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (23.1%) and 30 (18.0%) genes belonged to genes involved in hypoxic pellicle formation in the type strain. These hypoxic pellicle-associated genes detected both in M.i.27 and M.i.198 encoded methionine synthesis, acyl-CoA dehydrogenase, isocitrate lyase, MMPL family transporter at all time points (from Day1 to Week 16). And additionally, multifunctional oxoglutarate decarboxylase/dehydrogenase, proteasome subunits, ABC transporter ATP-binding protein/permease, lipase chaperone at all time points (from Week 4 to Week 16). We have described these results in the Result section (page 14 lines 236-248) and new Supplementary Tables 12 and 13.

      As for M. intracellulare, conditionally essential genes have not been revealed except for those required for hypoxic pellicle formation in ATCC13950 revealed by us (Tateishi Y. Sci Rep. 2020; new Ref#10). This study is the first to focus on the relationship between the difference of genetic requirements among strains and hypoxic adaptation. We found a certain proportion of overlapped genes required for mouse lung infection and ATCC13950’s hypoxic pellicle formation. We consider it reasonable to focus on the category of genes required for hypoxic pellicle formation to analyze the datasets of TnSeq in mice.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (3e) Page 13, lines 224-227: "Despite the differences in the profiles of the genes required for in vivo infection between strains, these data suggest that increased gene essentiality for hypoxic growth confers advantages for pathogenesis in vivo."

      For the reason described above, I find it a misleading hypothesis that hypoxic growth confers advantages for pathogenesis in vivo. How come only 10-12% of the entire gene sets which include genes of varying functions, can be the sole contributors to bacterial survival in host organelles during infection?

      More importantly, the mouse is not considered a good model for hypoxia as mouse infection does not lead to the formation of solid granuloma with a hypoxic core Though I am not convinced with the authors' bias toward hypoxia-related genes, however, if at all they aim to investigate the role of such genes by an unbiased enrichment of TnSeq mutant, they should have used C3HeJ mice which are known to form granulomas (Boute et al., 2017 (doi: 10.1186/s13567-017-0477-7)).

      Thank you for the comments on the issue of the contribution of genes required for hypoxic growth and on the difference of hypoxic levels between mouse lineages. We did not intend to mention that a set of genes required for hypoxic growth is the sole contributor to bacterial survival in host organs during infection. As we discussed in the Discussion section, we acknowledge that the adaptation to the difference of carbon source between in vitr_o and _in vivo infection (i.e. preferential usage of lipid carbon source in vivo) is involved in the pathogenesis of mycobacterial diseases (Yang. Front Microbiol 2018; new Ref#33, Gouzy. Proc Natl Acad Sci U S A 2021; new Ref#29, Quinonez. mBio 2022; new Ref#40, Pandey. Proc Natl Acad Sci U S A 2008; new Ref#41). We consider that not only the genes required for hypoxic pellicle formation but also strain-dependent/accessory genes conferring kinds of metabolism other than hypoxic pellicle formation can be estimated to be involved in the in vivo mouse lung infection.

      We have modified the sentence to clearly express our intention as follows: “These in vivo TnSeq data suggest that, despite the differences in the profiles of the genes required for in vivo infection between strains, increase of genetic requirements for hypoxic growth in part contribute to the pathogenesis in vivo.” (pages 15-16, lines 269-271)

      It seems to be an interesting idea to perform TnSeq by using C3HeJ mice. The granuloma formed in C3HeJ mice becomes extremely hypoxic (less than 1%, corresponding the level of “pathological” hypoxia) which is as severe as the detection range by pimonidazole. In our model, the effect of such pathological levels of hypoxia on granuloma formation might not be detected. However, the lesion formed in C57BL/6 mice becomes a “physiological” level of hypoxia (5% O2) (McKeown SR. Br Br J Radiol. 2014) which is the same O2 level for M. intracellulare to form pellicles. In principle, oxygen levels inside human bodies are physiologically hypoxic, and many biological events are experimentally investigated in this condition. Thus, we consider that we were able to observe the effect of physiological hypoxia on M. intracellulre growth both in vitro (hypoxic pellicles) and in vivo (infected C57BL/6 mice).

      [Reference]

      Yang, T. et al. Pan-genomic study of Mycobacterium tuberculosis reflecting the primary/secondary genes, generality/individuality, and the interconversion through copy number variations. Front Microbiol 9, 1886 (2018).

      Gouzy, A., Healy, C., Black, K.A., Rhee, K.Y. & Ehrt, S. Growth of Mycobacterium tuberculosis at acidic pH depends on lipid assimilation and is accompanied by reduced GAPDH activity. Proc Natl Acad Sci U S A 118, e2024571118 (2021).

      Quinonez, C.G. et al. The role of fatty acid metabolism in drug tolerance of Mycobacterium tuberculosis. mBio 13, e0355921 (2022).

      Pandey, A.K. & Sassetti, C.M. Mycobacterial persistence requires the utilization of host cholesterol. Proc Natl Acad Sci U S A 105, 4376-4380 (2008).

      McKeown., S.R. et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87, 20130676 (2014).

      (3f) An important set of data with the ATCC13950 reference strain is missing here. It is suggested that authors perform this study with the reference strain to identify whether the enrichment of genes is similar across all strains or is specific to the clinical isolates.

      Thank you for the comment on the setting of ATCC13950 as a control strain in the mouse infection experiment. However, we proved that bacterial burden of ATCC13950 is reduced continuously from 4 weeks of infection, and that ATCC13950 is almost completely eliminated from 8 to 16 weeks of infection (BMC Microiol 2023; new Ref#22). Therefore, it is impossible to perform TnSeq to detect the genes required for persistent infection in mice infected with ATCC13950.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3g) Pages 13-14, lines 228-245: "We have performed a statistical enrichment analysis of gene sets by GSEA...".

      The comparison made here is not clear to me. It seems the authors do compare genes required for the growth of M.i.27 and M.i.198 in mouse lungs with the gene sets required for hypoxic pellicle formation in ATCC13950 together with the gene sets showing increased gene essentiality observed in the clinical MAC-PD strains, and claim that a significant % of genes belong to hypoxia-adaptation pathways. It is factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains. It is suggested that authors re-analyze their data by comparing genes required for the growth of M.i.27 and M.i.198 in mouse lungs individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth, and present accordingly.

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and through Week4 to Week16 of infection) and 40% (70 and 79 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      The tables and graphs of GSEA results are shown in Supplementary Figs. 5, 6.

      These data indicate that 40-50% of the genes required for in vitro hypoxic pellicle formation and the strain-dependent/accessory essential genes are significantly enriched individually with in vivo bacterial growth. We have added the result of reanalyzed data of GSEA in the Result (pages 16-17, lines 287-290). We have shown the detail of reanalyzed data of GSEA in Supplementary Figs. 5, 6 and Supplementary Tables 15, 16.

      (3h) Since authors have used Tnseq of pooled mutants, which often yields misleading information, it is important to validate some of their findings upon mouse infection with individual mutants that yield prominent as well as baseline reduction at different time points. In the absence of validation, it remains a mere speculation of the role of these genes in the infection of these strains to animals.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi Y. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL+. Microbiol Immunol 68, 339-347 (2024).

      (4) Result 4 (Page 14-15): Preferential hypoxic adaptation of clinical MAC-PD strains evaluated with bacterial growth kinetics.

      (4a) "The metabolic remodeling, such as the increased gene essentiality of gluconeogenesis and the type VII secretion system..". As stated above, the essentiality of a gene, being a qualitative trait, should not be presented in quantitative terms. The authors should re-phrase this statement.

      Following the comment, we have corrected the term as “The metabolic remodeling, such as the increased genetic requirements of gluconeogenesis and the type VII secretion system.” (page 17, lines 296-297)

      (4b) "overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by conferring these metabolic pathways..". There is a syntax error in this statement and needs revision.

      Following the comment, we have corrected the phrase as “overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by these metabolic pathways”. (page 17, lines 297-299)

      (4c) The altered requirement of genes in different clinical strains for survival provides only circumstantial evidence of metabolic remodeling. Authors are suggested to perform metabolic profiling of representative clinical and reference strains, as it is important to examine whether these bacteria indeed undergo metabolic shift.

      Thank you for the comment on the metabolic profiling of the representative clinical and reference strains. We previously published the TnSeq result of ATCC13950 and we produced the current data by organizing with our previous findings (Fig. 4 in Tateishi Y. Sci Rep 2020; new Ref#10). The priority of the current study was to elucidate the difference and diversity of genetic requirements between clinical MAC-PD strains and ATCC13950. We consider that it is of some value to show the even circumstantial evidence of metabolic remodeling by TnSeq, because it provides a strong rationale for proceeding to the next study including metabolomic analysis.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (5) Result 5 (Page 16-18): Effects of knockdown of universal and accessory/strain-dependent essential or growth-defect-associate genes in clinical MAC-PD strains.

      (5a) Lines 273-277: The rationale of using CRISPRi should be correctly presented to evaluate the effect of individual genes' suppression on the downstream phenotype and not to establish the CRISPRi silencing tool in MAC.

      Thank you for the comment on the rationale of the section of the CRISPR-i experiment. Following the comment, we have modified the sentence as follows: “With an intention to evaluate the effect of suppressing TnSeq-hit genes on bacterial growth.” (page 19, lines 333-334 in the revised manuscript).

      (5b) Line 278: pRH2052/pRH2521 are the plasmids and not the CRISPRi system.

      Following the comment, we have corrected the phrase as “pRH2052/pRH2521 clustered regularly interspaced short palindromic repeats interference (CRISPR-i) plasmids.” (page 19, lines 334-335 in the revised manuscript).

      (5c) Line 280: Other pioneering studies on the use of CRISPRi for gene silencing in mycobacteria (Chaudhary et al., Nat Comm, Rock et al., Nat Microbio) should also be cited.

      Thank you for the comment on adding the reference papers on CRISPR-i in mycobacteria. We have added the two suggested papers in the revised manuscript as new Ref #30 and #31. (page 19, line 336).

      (5d) Lines 282-283: It is not clear why M001 and MOTT64 could not be transformed. Did the authors use any control plasmid to evaluate the transformation efficiency of these strains?

      Thank you for the comment on the failure of transformation in M001 and MOTT64.

      Following the comment, we have performed the experiment for evaluating the efficiency of transformation in the 9 M. intracellulare strains we used in this study. We have used an E. coli-mycobacteria shuttle vector pSO246KM-Prhsp65-luc that expresses firefly luciferase as a control plasmid (Aoki K. J Biol Chem 2004). For obtaining transformed colonies, we used 7H10/OADC agar plates containing the same concentration of kanamycin that we used for preparing Tn mutant libraries and for obtaining CRSISPR-i knockdown strains.

      We have observed no colonies grown on agar plates in MOTT64 after electroporation of the pSO246KM-Prhsp65-luc plasmid. In most of the remaining strains, the transformed colonies have emerged fully on day 10 of culture after electroporation of the plasmid. However, we have observed that M001 needs twice as long as a period for the emergence of transformed colonies. On day 21, the number of colonies in M 001 have finally become comparable to that of the other strains. We have checked the luciferase activity of 6-12 colonies in each strain except for MOTT64, and we have confirmed the transformation of the plasmid by the data of higher luciferase activity in the colonies undergoing electroporation of the plasmid than in those not undergoing electroporation.

      The possible reason for the incapability of obtaining transformants of CRISPR-i vectors in MOTT64 may be due to the extremely low efficiency of acquiring foreign DNA. And the possible reason for the incapability of obtaining transformants of CRISPR-i vectors in M001 may be intolerable to the stress caused by transformation of plasmids compared to other M. intracellulare strains. For M001, pSO246KM-Prhsp65-luc plasmid may cause tolerable stress for transformation, resulting in the delayed emergence of transformed colonies. By contrast, the CRIPSR-i plasmids may cause greater stress for M001 than pSO246KM-Prhsp65-luc plasmid, resulting in being intolerable for transformation.

      Author response table 1.

      Author response image 3.

      Result of luciferase activities before and after transformation of pS0246KM-Prhsp65-luc plasmid. Fifty microliter of cultures were mixed with 50 u L of assay reagents (Luciferase assay system E1500, Promega) and luciferase activity was measured by the luminometer (FilterMax F5, Molecular Devices). Data are shown as mean ± SD of 6-12 colonies

      [Reference]

      Aoki K. Extracellular mycobacterial DNA-binding protein 1 participates in Mycobacterium-lung epithelial cell interaction through hyaluronic acid. J Biol Chem 279, 39798–39806 (2004).

      (5e) Lines 283-186: "To confirm the gene essentiality detected with the HMM analysis, we evaluated the consequent growth inhibition in the knockdown strains of representative universal essential or growth-defect-associated genes, including glcB, inhA, gyrB, and embB.." It is not clear what was the level of suppression of these genes in the respective KD strains. Authors should include the level of suppression of these genes also by qRT-PCR.

      Thank you for the comment on the suppression levels of gene expression in knockdown strains of universal essential genes. Following the comment, we have evaluated them by qRT-PCR and we observed comparable levels of knockdown efficiency in the knockdown strains between universally essential genes and strain-specific/accessory essential genes (new Supplementary Fig. 9). Overall, the gene expression was suppressed to 20 - 70% in the knockdown strains compared to the vector control strains that do not express sgRNA.

      We have added the data of qRT-PCR of knockdown strains of universal essential genes such as glcB, inhA, gyrB, and embB (new Supplementary Fig. 9). We have revised the Result and Discussion in the manuscript (page 21, lines 367-376; page28, lines 490-497).

      (5f) Lines 293-: I am unable to establish any correlation between the growth of the knockdown with Tn insertion reads in the respective genes. For instance, pckA exhibits reduced Tn insertion reads in almost all the strains except in M.i.27, but the effect of its KD on growth is seen only in M.i.198 and M003; glpX exhibits reduced Tn insertion reads in M003, M019, M021 but the effect of its KD on growth is seen only in M003; csd exhibits reduced Tn insertion reads in M.i.198, M003, M019 but the effect of its KD on growth is seen only in M.i.198 and M003. The authors argue that these contradictory phenotypes are due to difficulties in the effective operation of genetically modified systems using foreign genes from different bacterial species in MAC-PD strains (Lines 310-312) or the desired effect on growth could not be observed due to the inability of CRISPRi to yield >99% suppression (Line 314) are not the valid justifications. Indeed, a close look at the RT-PCR data (Figure S5) reveals that pckA levels are ~0.22, 0.5, 0.2, 0.22, 0.2, 0.5, and 0.3 fold relative to sigA in M.i.198, M.i.27, ATCC13950, M018, M019, M003 and M021, respectively, but the effect of its suppression on growth by CRISPRi is seen only in M.i.198 and M003. Secondly, >99% suppression is not a universal prerequisite for all the genes to show growth defect (as might be the case with glcB, inhA, gyrB, and embB genes in this study). Hence, it remains unclear why contrasting results are obtained for most of the genes by TnSeq and CRISPRi.

      Thank you for the comments on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results of universal essential genes. By obtaining the data of suppression levels of gene expression in the knockdown strains of universal essential genes, we have acknowledged that the low efficiency of knockdown does not explain the reason of the discrepancy between TnSeq and CRISPR-i results because the levels of knockdown efficiency were comparable between strain-dependent/accessory essential genes and universally essential genes.  

      Although the mechanism has not been fully proven yet only from the current study, we consider that such inconsistent phenotypes with TnSeq and CRISPR-i based knockdown may be related to the recently revealed the bypass mechanism of gene essentiality which is characteristically observed in strain-dependent/accessory essential or growth-defect-associated genes. According to the publication by Rosconi (Nat Microbiol. 2022: new Ref#14) reporting the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains, gene essentiality can be bypassed by several mechanisms including the composition of the accessory genome and pathway rewiring. They recovered successfully knockout mutants from transformation experiments in strain-specific/accessory essential genes such as cytidine monophosphate kinase, a folate pathway enzyme formate tetrahydrofolate ligase and an undecaprenyl phosphate-biosynthesis pathway enzyme farnesyl-diphosphate synthase. The bypassing of gene essentiality could be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes were reported to fulfill the three categories including high levels of conservation within and often across species, limited genetic diversity, and high and stable expression levels. Consequently, universal essential genes are estimated to be rigid, largely immutable key components to an organism’s survival.

      We consider that this is the case with our study on NTM because NTM is pangenomic. The knockdown of universal essential genes resulted in the clear growth suppression; however, the knockdown of strain-dependent/accessory essential genes did not show the consistent growth suppression. We consider that the bypass mechanism of gene essentiality can explain the inconsistent effect of gene silencing of strain-dependent/accessory genes on bacterial growth suppression.

      We have added the above-mentioned description in the Discussion (pages 28-29, lines 497-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Minor Comments:

      (1) The authors should mention the cut-off of fold-change for all the experiments in the methods section.

      Thank you for the comment on the cut-off of fold-change. We set the cut-off of fold-change as adjusted P-value < 0.05. We added the description in the Methods section. (page 41, lines 724-725)

      (2) Figure 7 legend (Lines 888-889): "Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown."

      Figure S3 legend: Data on the growth curves are the means of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown.

      Figure S4 legend: Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      Figure S5 legend: Gene expression data are the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      These statements need clarification. Whether multiple independent experiments (biological repeats), each with 2-3 technical replicates performed and the data shown represent one of the multiple biological repeats?

      Thank you for the comments on the number of experiments performed and the number of replicates. We have performed two or three independent experiments with 2-3 technical replicates. The data shown represent one of the independent experiments.

      (3) Figure 7b: Statistics are missing in the bar graph for growth rate under aerobic conditions.

      Thank you for the comment on the statistics of the data regarding growth rate under aerobic conditions. We have added the statistics in the new Fig. 7c.

      (4) The authors should check the y-axis in Figure 7b, as it is not clear whether bacteria indeed show a growth rate of 1-3 CFUs/day.

      Thank you for the comment on the y-axis in Figure 7b. We have corrected the label of y-axis as “log10[CFUs]/day” in the new Fig. 7c. Additionally, we have corrected the label of y-axis in new Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Fig. 7a legend.

      Reviewer #3 (Recommendations For The Authors):

      (1) It's notable that strains M001 and MOTT64 failed to undergo a transformation, while seven other strains did. Given that M001, MOTT64, and M019 belong to the same phylogenetic clade, it raises questions about why particular strains within this clade showed different transformation outcomes. It might be valuable for them to discuss this discrepancy in their study.

      Thank you for the comment on the difference in capacity of transformation between strains belonging to the same genomic subgroup. Although the direct mechanism determining the competency for foreign DNA has not been elucidated in M. intracellulare and other pathogenic NTM species, several studies on general bacteria suggest the difficulties of introducing foreign DNA into clinical strains compared to the laboratory strains. As suggested in Staphylococcus aureus (Covaglia AR. PNAS. 2010; new ref#55), some clinical strains develop elimination system of foreign nucleic acids such as a type III-like restriction restriction endonuclease. As suggested in gran-negative bacteria (Qin J. Sci Rep. 2022; new Ref#56), there may be some difference in cell surface structures between strains, resulting in the necessity of polymyxin B nonapeptide targeting cell membrane for transforming clinical strains. The efficiency of eliminating foreign DNA may be attributed to various kinds of strain-specific factors including restriction endonuclease, natural CRISPR-interference system and cell wall structures rather than a simple genotypic factor.

      We have added the description on the difference of capability in transformation in the Discussion. (page 31, lines 546-558)

      [References]

      Corvaglia, A.R., François, P., Hernandez, D., Perron, K., Linder, P. & Schrenzel, J. A type III-like restriction endonuclease functions as a major barrier to horizontal gene transfer in clinical Staphylococcus aureus strains. Proc Natl Acad Sci U S A 107, 11954-11958 (2010).

      Qin, J., Hong, Y., Pullela, K., Morona, R., Henderson, I.R. & Totsika, M. A method for increasing electroporation competence of Gram-negative clinical isolates by polymyxin B nonapeptide. Sci Rep 12,:11629 (2022).

      (2) The authors should consider specifying M. intracellulare in their title.

      Thank you for the comment on the manuscript title. Following the comments from all Reviewers, we have modified the title as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

    1. eLife Assessment

      This important study provides evidence supporting the idea that postnatal experience plays an instructive role in shaping the patterns of functional connectivity between extrastriate visual cortex and frontal regions during development, by comparing neonates, blind and sighted adults. The evidence supporting the authors' claim is solid. Nevertheless, substantial weaknesses remain in mechanistic interpretation and alignment with relevant developmental frameworks. This study will be of significant interest to neuroscientists and neuroimaging researchers focused on vision, plasticity and development.

    2. Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The inclusion of neonates offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The manuscript risks overstating a mechanistic distinction between sighted and blind development by framing visual experience as "instructive" and blindness as "reorganizing." Similarly, the binary framing of visual experience and blindness as independent may oversimplify shared plasticity mechanisms.

      The interpretation of changes in temporal correlations as altered neural communication does not adequately consider how shifts in shared variance across networks may influence these measures without reflecting true biological reorganization.

      The discussion does not substantively engage with the longstanding debate over whether sensory experience plays an instructive or permissive role in cortical development.

      The relationship between resting-state and task-based findings in blindness remains unclear.

    3. Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organization arises over development. Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the starting state in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible then that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that birth, or the first couple weeks of life, are a clear cut "starting point" for development, after which all change can be attributed to experience.

    4. Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      Overall, the presented analyses are solid and well detailed, and the results and discussion are convincing.

      Weaknesses

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthen the study design.

      The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. eLife Assessment

      This important study highlights the novel role of RSPO mimetic SZN-043 in the activation of hepatic WNT signaling and promoting hepatocyte regeneration. The authors provide convincing evidence of SZN-043 increasing hepatocytes proliferation in various mouse models, including a humanized mouse liver model, ALD model and CCL4 fibrosis model. This study will be of interest to researchers in liver regeneration and repair mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      The work by Fisher et al describes the role of novel RSPO mimetics in the activation of WNT signaling and hepatocyte regeneration. However, the results of the experiments and weaknesses of the methods used do not support the conclusions of the authors that the new therapy can promote liver regeneration in alcohol-induced liver cirrhosis.

      Strengths:

      Similarly to its precursor, aASGR1-RSPO2-RA-IgG, SZN-043 can upregulate Wnt target genes and promote hepatocyte proliferation in the liver.

      Comments on revisions:

      The authors responded to all my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Fisher et al investigates therpauetic role for SZN-043, a hepatocyte-targeted R-spondin mimetic, for its potential role in restoring Wnt signaling and promoting liver-regeneration in alcohol-associated liver disease (ALD). Using multiple preclinical models, the compound was shown to promote hepatocyte proliferation and reduce fibrosis. This study highlights the efficacy in promoting liver regeneration while maintaining controlled signaling. Limitations include a need for further exploration of off-target effects and fibrosis mechanisms. The findings support SZN-043 as a promising candidate for ALD therapy, warranting further clinical evaluation. This is a well deigned study with thorough investigation using multiple disease models.

      Strengths:

      (1) Well-written manuscript with clear design, robust methods, and discussion.

      (2) Using multiple models strengthens the findings and expands beyond ALD.

      (3) Identification of SZN-043 as a novel potent drug for liver regeneration.

    4. Author response:

      Response to Comments from reviewer #1

      Many thanks for appreciating that SZN-043 can promote hepatocyte proliferation via the Wnt-signaling pathway.

      (1) The reviewer is concerned with using only CYP1A2 expression as an endpoint to make a conclusion about the effect of SZN-043 on Wnt activity in human ALD samples. The reviewer raises a good point as the more commonly used Wnt target gene, AXIN2, is not consistantly changed in both cohorts. We were at first also surprised by this finding. However, upon closer analysis we found that the expression of hepatocyte-specific target genes such as CYP1A2 (Figure 2), CYP2E1, OAT, LGR5, GLUL (Table 1) and ZNRF3 were mostly expressed in hepatocytes and ductal cells were all down-regulated in ALD samples. Others Wnt target genes expressed in epithelial and mesenchymal liver cell populations, such as AXIN2, CCND1 and NOTUM are indeed not consistently and significantly changed. Given that SZN-043 is not active on mesenchymal cells, this discrepancy could be best explained by the large increase in mesenchymal cells in ALD tissue samples, thereby confounding the results. We have now clarified this in the discussion. Another method to assess Wnt activity is to measure b-catenin phosphorylation and nuclear transfer. In our hands, this method was found to be better suited for tissue culture than histological sections from in vivo studies. We have also amended the manuscript title to refer to expression of Wnt target genes, rather than Wnt activity.

      (2) We have now added a supplemental figure to show the lack of Ki-67+ human hepatocytes in the cirrhotic tissue samples to confirm the absence of hepatocyte proliferation (Figure S1).

      (3) The differences in amino acid sequence between SZN-043 and its precursor, αASGR1-RSPO2-RAIgG, can be found in the material and method section. These changes in amino acid sequences improved the biophysical properties of the final clinical candidate, such as oxidation and nonspecific binding. The biochemical analysis of those differences exceeds the scope of the current manuscript. We present here the pharmacokinetic properties of SZN-043 only, as this was the only molecule advanced to clinical trial and used in the studies presented here.

      (4) The reviewer suggests to assess the effect of SZN-043 in Ctnnb1-KO mice to confirm that SZN043 acts via a canonical Wnt pathway. Indeed, there were several reports on the ability of Rspondin to act on other pathways besides the Wnt signaling pathway (for recent review, Niehrs et al, 2024, Bioessays). However, while an interesting suggestion, this line of investigation belongs to MOA studies and exceeds the scope of the current manuscript. An additional manuscript presenting MOA studies for SZN-043 was recently submitted elsewhere. Still, we have added this possibility in the discussion section.

      (5) The reviewer is asking how SZN-043 is affecting liver functions in general. Indeed, we have observed a consistent reduction in the international normalized ratio of prothrombin time using the thioacetamide (TAA)-induced fibrosis model and previously published those findings (Zhang, 2020). In our hands, the TAA is the only liver injury model that significantly increases INR. This increase is modest compared to that observed in clinical patients. Therefore, we do not report INR findings for other models. We have not seen any effects of SZN-043 on hepatocyte differentiation markers such as HNF4A (data not shown) and the hepatocyte specific ASGR1/2 as shown in Figure 5. Rather we focused on proliferation as the main potentially beneficial endpoint, to restore the parenchymal mass in injured livers. Finally, consistent with what was reported in the literature, we have observed a transient and reciprocal effect on albumin and alfa-fetoprotein expression during the proliferative phase of liver regeneration. These results are detailed in an additional manuscript presenting MOA studies for SZN-043, which was recently submitted elsewhere.

      (6) We have used females only in the ethanol-induced injury models because there are numerous reports in the literature stating that males are not as susceptible to those injuries.  

      (7) The reviewer questions the relevance of the ethanol-induced injury model used to evaluate SZN043 efficacy. Indeed, none of the disease model developed to date reproduce the severity and complexity of alcohol-associated liver diseases, although some, such as the ethanol supplemented Lieber DeCarli diet, are more commonly used than others – which is the reason why this model was selected. 

      (8) The reviewer questions the relevance of the fibrosis model used to evaluate SZN-043 efficacy. Indeed, none of the fibrosis models developed to date reproduce the severity and complexity of cirrhosis in human livers. While combining ethanol with CCl4 would lead to more severe fibrotic livers, CCl4 itself is not involved in ALD in humans. Both models are likely to result in similar pericentral fibrosis with central-to-central bridging. In this study, we were mostly interested in addressing the effects of SZN-043 in a tissue affected by fibrotic scars.  

      (9) The sex of CCl4-treated mice is male. We added this information in the methods section.

      (10) A summary of histology and fibrosis assessment data for alcohol-fed mice was added in supplemental Table S3. In our hands, the use of aging mice did not induce the presence of fibrosis, in contrast to published results.  

      (11) The rationale for using 13.5-month-old mice in the alcohol studies and scid mice in the CCl4 studies has been clarified in the results and discussion sections. 

      a. Briefly, aging mice were reported to be more susceptible to ethanol-induced injury than young mice and to include induction of fibrosis. However, we were unable to reproduce the presence of fibrosis reported in the literature.  

      b. Scid mice were used in the CCl4 studies to test whether a stronger response could be observed in the absence of a potential anti-drug antibodies response. While a modest reduction in fibrosis was observed in both B6 and scid mice following the SZN-043 treatment, the effect size did not seem affected by the mouse strain. 

      Response to Comments from reviewer #2

      Many thanks for appreciating that the use of multiple disease models to identify SZN-043 as a potential novel drug for liver regeneration.

      (1) The importance of restoring liver regeneration capacity to reduce the need for liver transplantation had been emphasized in the introduction.

      (2) There is continuous damage to the mouse hepatocytes in the FRG mice, due to the Fah mutation. They undergo repair mechanisms favoring the proliferation of human hepatocytes during the production period. Injury models that affect the human hepatocytes population have been developed in these mice. However, the primary goal of this study was to confirm that SZN043 was efficacious in inducing human hepatocytes proliferation, a feature difficult to reproduce in primary hepatocyte cultures. Given the artefactual nature of the chimeric liver in FRG mice and the high cost of these mice, further studies were not judged to be necessary.

      (3) Corrected

      (4) A figure including DAPI staining has now been included in supplemental Figure S2.

      (5) Clarification that the 8 weeks alcohol feeding used in our study design is a modification of the NIAAA model. While some ASGR1 has been reported on the surface of macrophages, additional data from MOA studies strongly suggest that the effect of SZN-043 is mediated via a hepatocytespecific mechanism (submitted manuscript).

      (6) The reviewer inquired about the potential role of macrophages in promoting an antiinflammatory state in response to SZN-043. While a direct effect is unlikely, a potential effect of macrophages in response to SZN-043 is plausible. Wnt activation is known to induce the secretion of hepatokines, such as LECT2, which in turn can influence macrophage activity. This possibility is discussed in the discussion section.

      (7) The potential off-target effects of SZN-043 such as stellate cell activation is discussed in the discussion section.

      (8) The discussion of the limitations of current models has been included in the discussion section of the manuscript.

      (9) We have now included a discussion of prior RSPO-based therapies, such as OMP-131R10. We explain why the hepatocyte-targeting of RSPO activity minimizes undesired effects.

    1. eLife Assessment

      This study presents a valuable finding that the blood-brain barrier (BBB) may be modulated through specific modes of electroacupuncture stimulation. The data were collected and analyzed using a solid and validated methodology, and can be used as a starting point for functional studies of the BBB for drug delivery across healthy and diseased states. The work will be of broad interest to scientists working in the field of drug delivery and drug development.

    2. Joint Public Review:

      This study employs single-cell RNA sequencing to investigate how electroacupuncture (EA) stimulation alters the transcriptional profiles of central nervous system cell types following blood-brain barrier (BBB) opening. The authors seek to characterize changes in gene expression and pathway activities across diverse neural cells in response to electroacupuncture (EA) stimulation using high-resolution transcriptomics. This approach has the potential to elucidate the cellular mechanisms underlying EA stimulation and their implications for therapeutic intervention. The work engages with a timely and biologically significant question regarding noninvasive stimulation methods to manipulate BBB permeability. However, no in vivo/in vitro functional assays are provided to validate the changes in BBB permeability or cytokine release in the tested models. The experimental rationale remains inadequately explained, and key details regarding the magnitude, duration, and spatial distribution of BBB opening in this system are still lacking.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, and we hace provided a more detailed introduction in the introduction section, in lines 60-71 of the manuscript.

      We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      In addition, to validate the results at the protein level, we have recently conducted some experiments. However, as several proteins are currently at a critical stage of further experimental validation, it is not appropriate to present them in the manuscript at this time. Instead, we have uploaded the relevant data as an appendix for your review. This includes a figure of several protein markers we examined, as well as a table of the antibodies used.

      This section is also further elaborated in the introduction and its references.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.

      (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      This section is also further elaborated in the introduction and its references.

      Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      Considering the overall data layout and the length of the article, we ultimately decided not to make any changes to the presentation of the article's data. The images included in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view any data they are interested in.

      Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures 3-7: Label treatment groups (CON vs. EA) consistently in legends.

      (2) Methods: Specify rat strain (Sprague-Dawley) in the abstract.

      (3) Clarify Limitations: Explicitly state that BBB opening is inferred, not proven.

      This section has been revised at lines 743-733, 748, 949, 754-755, and 759-760 of the manuscript.

      Revised at line 31 of the manuscript.

      Thank you for your feedback. The background information on the open evidence of BBB has been added to the introduction.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract and Introduction

      • Include specific key findings in the abstract to improve clarity and reader engagement.

      • Expand the introduction to situate this work in the context of other BBB-opening methods (e.g., ultrasound) and the known consequences of BBB disruption.

      • Clarify the rationale for choosing electroacupuncture.

      • Include information (perhaps summarized from previous studies) about the extent, timeline, and functional assessment of BBB opening in this model to help justify the single-cell RNA-seq design.

      (2) Experimental Rationale and Context

      • Reiterate experimental design and rationale in each results section, rather than relying exclusively on the Methods section.

      • Specify the time point of tissue collection relative to the EA intervention.

      • Describe the anatomical sites of acupuncture stimulation and their physiological relevance.

      (3) Data Presentation

      • Replace the human brain cartoon in Figure 1 with an anatomically appropriate rat brain schematic.

      • Reevaluate which data are presented in the main versus supplementary figures. Highlight biologically meaningful results, such as cell-cell communication and ligand-receptor interactions, in the main figures rather than supplementary data.

      (4) Interpretation and Modeling

      • More carefully link transcriptional changes (e.g., Wnt signaling in microglia) to biologically plausible mechanisms of BBB regulation-e.g., microglial signaling to endothelial cells.

      • Clarify whether the presence of granulocytes and T cells might result from a lack of perfusion prior to brain dissection.

      • Consider proposing a model (even speculative) of how EA leads to BBB opening based on observed transcriptional changes.

      First, for the sake of brevity in the abstract, we did not present specific results in this section. Second, since BBB opening via EA is a unique strategy, our previous studies have examined the opening time window and the recovery of the BBB after EA intervention (as mentioned in the introduction). We believe its characteristics differ from those of ultrasound-induced BBB opening and BBB disruption, so we did not conduct comparative discussions, but objectively presented our research findings. In further functional validation experiments, we may consider integrating other opening strategies in our studies. Additionally, the choice of electroacupuncture was based on our previous series of studies, which have already been outlined in the research background. Finally, we did indeed determine the experimental design of this study based on prior research, as described in the background section of the introduction.

      We decided not to make changes to this section in the manuscript after careful consideration. The setup of electroacupuncture intervention and controls has been thoroughly discussed in our previous studies (as referenced in the introduction), so we have not repeated it in this manuscript. Overall, building on all our previous findings, this study focuses primarily on the potential mechanisms of EA intervention. The anatomical sites of acupuncture stimulation and their physiological relevance are another key area of our research, and we are currently conducting a series of related studies. We look forward to sharing these findings with you in the future.

      We have already changed the human brain diagram in Figure 1 to a rat brain diagram, and have replaced Figure 1 in the files with the revised version. However, considering the overall data layout and the length of the article, we ultimately decided not to make changes to the data presentation in the manuscript. The images in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view the data they are interested in.

      This section has provided us with excellent suggestions for further exploration, although no changes have been made to the manuscript at this time. In the future, we may conduct more detailed transcriptomic studies focusing on sex differences and different brain regions, which will allow for a more comprehensive analysis of the biological mechanisms involved in BBB regulation.

    1. eLife Assessment

      This valuable study explores the role of the chromatin regulator ATAD2 in mouse spermatogenesis. It convincingly demonstrates that ATAD2 is essential for proper chromatin remodeling in haploid spermatids, influencing gene accessibility, H3.3-mediated transcription, and histone eviction. Using Atad2 knockout (KO) mice, the authors link ATAD2 to the DNA-replication-independent incorporation of sperm-specific proteins like protamines and histone H3.3. Although the findings highlight chromatin abnormalities and impaired in vitro fertilization in KO mice, natural fertility remains unaffected, suggesting possible in vivo compensatory mechanisms. However, in its current form, the study lacks mechanistic insight and provides only partial evidence for ATAD2's molecular role, limiting its functional conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

      Strengths:

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis.

      Weaknesses:

      (1) Some results lack quantification.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-to-protamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility.

      Strengths:

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo.

      Weaknesses: The MS is robust and there are not big weaknesses

    4. Reviewer #3 (Public review):

      Summary:

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally.

      Strengths:

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function.

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype?

      (5) The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      We would like to take this opportunity to highlight that the present study builds on our previously published work, which examined the function of ATAD2 in both yeast S. pombe and mouse embryonic stem (ES) cells (Wang et al., 2021). In yeast, using genetic analysis we showed that inactivation of HIRA rescues defective cell growth caused by the absence of ATAD2. This rescue could also be achieved by reducing histone dosage, indicating that the toxicity depends on histone over-dosage, and that HIRA toxicity, in the absence of ATAD2, is linked to this imbalance.

      Furthermore, HIRA ChIP-seq performed in mouse ES cells revealed increased nucleosome-bound HIRA, particularly around transcription start sites (TSS) of active genes, along with the appearance of HIRA-bound nucleosomes within normally nucleosome-free regions (NFRs). These findings pointed to ATAD2 as a major factor responsible for unloading HIRA from nucleosomes. This unloading function may also apply to other histone chaperones, such as FACT (see Wang et al., 2021, Fig. 4C).

      In the present study, our investigations converge on the same ATAD2 function in the context of a physiologically integrated mammalian system—spermatogenesis. Indeed, in the absence of ATAD2, we observed H3.3 accumulation and enhanced H3.3-mediated gene expression. Consistent with this functional model of ATAD2— unloading chaperones from histone- and non-histone-bound chromatin—we also observed defects in histone-toprotamine replacement.

      Together, the results presented here and in Wang et al. (2021) reveal an underappreciated regulatory layer of histone chaperone activity. Previously, histone chaperones were primarily understood as factors that load histones. Our findings demonstrate that we must also consider a previously unrecognized regulatory mechanism that controls assembled histone-bound chaperones. This key point was clearly captured and emphasized by Reviewer #2 (see below).

      Strengths: 

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis. 

      Weaknesses: 

      (1) Some results lack quantification. 

      We will consider all the data and add appropriate quantifications where necessary.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      Please see our comments above.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-toprotamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility. 

      Strengths: 

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo. 

      Weaknesses:

      The MS is robust and there are not big weaknesses 

      Reviewer #3 (Public review): 

      Summary: 

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally. 

      Strengths: 

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function. 

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis. 

      We respectfully disagree with the statement that our findings are largely superficial. Based on our investigations of this factor over the years, it has become evident that ATAD2 functions as an auxiliary factor that facilitates mechanisms controlling chromatin dynamics (see, for example, Morozumi et al., 2015). These mechanisms can still occur in the absence of ATAD2, but with reduced efficiency, which explains the mild phenotype we observed.

      This function, while not essential, is nonetheless an integral part of the cell’s molecular biology and should be studied and brought to the attention of the broader biological community, just as we study essential factors. Unfortunately, the field has tended to focus primarily on core functional actors, often overlooking auxiliary factors. As a result, our decade-long investigations into the subtle yet important roles of ATAD2 have repeatedly been met with skepticism regarding its functional significance, which has in turn influenced editorial decisions.

      We chose eLife as the venue for this work specifically to avoid such editorial barriers and to emphasize that facilitators of essential functions do exist. They deserve to be investigated, and the underlying molecular regulatory mechanisms must be understood.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.  

      (1) In the revised version, we will include Venn diagrams to illustrate the overlap in significantly differentially expressed genes between this study and previous work. However, we believe that the GSEAs presented here provide stronger evidence, as they indicate the statistical significance of this overlap (p-values). In our case, we observed p-value < 0.01 (**) and p < 0.001 (***).

      (2) Sex chromosome gene expression was analyzed and is presented in Fig. 5C.

      (3) The effect of ATAD2 loss on gene expression is shown in Fig. 4A, B, and C as histograms, with statistical significance indicated in the middle panels.

      (4) Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq, at least in our hands. The authors of Fontaine et al., 2022, who studied H3.3 during spermatogenesis in mice, must have encountered the same problem, since they tagged the endogenous H3.3 gene to perform their ChIP experiments.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.  

      Based on our understanding of ATAD2’s function—specifically its role in releasing chromatin-bound HIRA—in the absence of ATAD2 the residence time of both HIRA and H3.3 on chromatin increases. This results in the detection of H3.3 not only on sex chromosomes but across the genome. Our data provide clear evidence of this phenomenon. The reviewer is correct in suggesting that the accumulated H3.3 would carry H3.3-associated histone PTMs; however, we are unsure what additional insights could be gained by further demonstrating this point.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim. 

      Figure 7 does not suggest that pre-PRM2 processing is affected in Atad2 KO; rather, this figure—particularly Fig. 7B—specifically demonstrates that pre-PRM2 processing is impaired, as shown using an antibody that recognizes the processed portion of pre-PRM2. ELISA was used to provide a more quantitative assessment; however, in the revised manuscript we will also include a western blot image.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype? 

      These are interesting experiments that require the creation of appropriate mouse models, which are not currently available.

      (5)The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation. 

      The Reviewer is absolutely correct. In addition to the points addressed in response to Reviewer #1’s general comments (see above), it would indeed have been very interesting to test the segregase activity of ATAD2 (likely driven by its AAA ATPase activity) through in vitro experiments using the Xenopus egg extract system described by Tagami et al., 2004. This system can be applied both in the presence and absence (via immunodepletion) of ATAD2 and would also allow the use of ATAD2 mutants, particularly those with inactive AAA ATPase or bromodomains. However, such experiments go well beyond the scope of this study, which focuses on the role of ATAD2 in chromatin dynamics during spermatogenesis

      Reference

      Wang T, Perazza D, Boussouar F, Cattaneo M, Bougdour A, Chuffart F, Barral S, Vargas A, Liakopoulou A, Puthier D, Bargier L, Morozumi Y, Jamshidikia M, Garcia-Saez I, Petosa C, Rousseaux S, Verdel A, Khochbin S. ATAD2 controls chromatin-bound HIRA turnover. Life Sci Alliance. 2021 Sep 27;4(12):e202101151. doi: 10.26508/lsa.202101151. PMID: 34580178; PMCID: PMC8500222.

      Morozumi Y, Boussouar F, Tan M, Chaikuad A, Jamshidikia M, Colak G, He H, Nie L, Petosa C, de Dieuleveult M, Curtet S, Vitte AL, Rabatel C, Debernardi A, Cosset FL, Verhoeyen E, Emadali A, Schweifer N, Gianni D, Gut M, Guardiola P, Rousseaux S, Gérard M, Knapp S, Zhao Y, Khochbin S. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J Mol Cell Biol. 2016 Aug;8(4):349-62. doi: 10.1093/jmcb/mjv060. Epub 2015 Oct 12. PMID: 26459632; PMCID: PMC4991664.

      Fontaine E, Papin C, Martinez G, Le Gras S, Nahed RA, Héry P, Buchou T, Ouararhni K, Favier B, Gautier T, Sabir JSM, Gerard M, Bednar J, Arnoult C, Dimitrov S, Hamiche A. Dual role of histone variant H3.3B in spermatogenesis: positive regulation of piRNA transcription and implication in X-chromosome inactivation. Nucleic Acids Res. 2022 Jul 22;50(13):7350-7366. doi: 10.1093/nar/gkac541. PMID: 35766398; PMCID: PMC9303386.

      Tagami H, Ray-Gallet D, Almouzni G, Nakatani Y. Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell. 2004 Jan 9;116(1):51-61. doi:10.1016/s0092-8674(03)01064-x. PMID: 14718166.

    1. eLife Assessment

      This useful work identifies new monoclonal antibodies produced by cystic fibrosis patients against Pseudomonas aeruginosa type three secretion system. The evidence supporting authors' claim is solid. Nonetheless, the manuscript may benefit from a more in depth description of what the authors learned from their structure-based analyses of antibodies targeting PcrV.

    2. Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      The antibiotic resistance crisis requires the development of new solutions to treat infections cause by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuate data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

    3. Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). Authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6-PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      • Article is well written.

      • Authors used complementary assays to evaluate protective effect of candidate monoclonal antibodies.

      • Authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      • Authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      • Results shown in Fig. 6 should be initially described in the Results section and not in the Discussion section.

      • The authors should describe, in the Discussion (and also in L146-147), in more detail the gained insights into how anti-PcrV antibodies work. This is especially important given previous reports of more potent antibodies (e.g., Simonis et al.) that significantly reduces the novelty of their work. Hence, authors could explicitly highlight how their study differentiate from previous work, and what unique insights were gained (in the current version is not completely obvious).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      Strengths:

      The antibiotic resistance crisis requires the development of new solutions to treat infections caused by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report, the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuable data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for the characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

      Weaknesses:

      Although the information presented in this manuscript is important, previous reports regarding other T3SS structures complexed with antibodies, reduce the novelty of this report. Nevertheless, we provide several comments that may help to improve the report. The structural analysis of the presented mAbs is incomplete and unfortunately, the authors did not address any developability assessment. With such vital information missing, it is unclear if the proposed antibodies are suited for diagnostic or therapeutic usage. This vastly reduces the importance of the possibly great potential of the authors' findings. Moreover, the structural information does not include the interacting regions on the mAb which may impede the optimization of the mAb if it is required to improve its affinity.

      As described in the manuscript (Fig. 6), our mAbs are markedly less effective in every in vitro T3SS inhibition assay than the mAbs recently described by Simonis et al. They are therefore very unlikely to outperform these mAbs in in vivo animal models of P. aeruginosa infection. Considering the high cost of animal experiments and ethical concerns-and in accordance with the Reduction principal of the 3Rs guidelines-we chose not to pursue in vivo experiments. Instead, we focused on leveraging the new isolated mAbs to investigate the mechanisms of action and structural features of anti-PcrV mAbs.

      Following the reviewer's suggestion, we have now added mAb interaction features into the structural data presented in the manuscript. However, based on the efficiency data, the structural analysis and the mechanistic insights presented, we do not consider further therapeutic use and optimization of our mAbs to be warranted.

      Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). The authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      (1) The article is well written.

      (2) The authors used complementary assays to evaluate the protective effect of candidate monoclonal antibodies.

      (3) The authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      (4) The authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      The authors used a similar workflow to the one previously reported in Simonis et al. 2023 (antibodies from cystic fibrosis patients that included B cell isolation, antibody-PcrV interaction modeling, etc.) but the authors do not clearly explain how their work and findings differentiate from previous work.   

      We employed a similar mAb isolation pipeline to that used by Simonis et al., beginning with the screening of a cohort of cystic fibrosis patients chronically infected with P. aeruginosa. As in Simonis et al., we isolated specific B cells using a recombinant PcrV bait, followed by single-cell PCR amplification of immunoglobulin genes. The main differences in methodology between the two studies are as follows: i) the use of individuals from different cohorts, and therefore having different Ab repertoires; ii) the nature of the screening assays, although in both cases the screening was focused on the inhibition of T3SS function; iii) the PcrV labeling strategy, with Simonis et al. employing direct labeling, whereas we used a biotinylated tag combined with streptavidin;

      The number of specific mAbs obtained and produced was higher in Simonis et al. (47 versus 9 in our study). They sorted B cells from three individuals compared to two in our work and possibly started with a larger amount of PBMCs per donor, which may account for the higher number of specific B cells and mAbs isolated. Considering that the strategies were overall very similar, the greater number of mAbs isolated in Simonis et al. likely explains, to a large extent, why they identified mAbs targeting different epitopes compared to ours, including highly potent mAbs that we did not recover. 

      Our modeling study, unlike that of Simonis et al., which relied on an AlphaFold prediction of the multimeric structure of P. aeruginosa PcrV, was based on the experimentally determined structure of the homologous Salmonella SipD pentamer, as described in the manuscript. Furthermore, we compared our mAb P3D6 not only with 30-B8 from Simonis et al., but also with MEDI3902. Finally, in contrast to the approach of Simonis et al., we used functional assays to investigate the differences in mechanisms of action among these mAbs, which target three distinct epitopes.

      (2) Although new antibodies against P. aeruginosa T3SS expand the potential space of antibodybased therapies, it is unclear if P3D6 or P5B3 are better than previous antibodies. In fact, in the discussion section authors suggested that the 30-B8 antibody seems to be the most effective of the tested antibodies.  

      As explained above and shown in the Results section (Figure 6), the 30-B8 mAb is markedly more effective at inhibiting T3SS activity in both in vitro assays used.

      (3) The authors should explain better which of the two antibodies they have discovered would be better suited for follow-up studies. It is confusing that the authors focused the last sections of the manuscript on P3D6 despite P3D6 having a much lower ExoS-Bla inhibition effect than P5B3 and the limitation in the PcrV variant that P3D6 seems to recognize. A better description of this comparison and the criteria to select among candidate antibodies would help readers identify the main messages of the paper. 

      The P3D6 mAb shows stronger inhibitory activity than P5B3 in the two assays used, as shown in Supplementary Figure 1. An error in the table in Figure 2B was corrected and this table now reflects the results presented in Supplementary Figure 1. 

      The final sections of the manuscript focus on P3D6, which is more potent than P5B3, and for which we successfully determined a co-crystal structure with PcrV*. All parallel attempts to obtain a structure of P5B3 in complex with PcrV* failed. The P3D6-PcrV* structure was used to analyze epitope recognition and mechanisms of action in comparison to previously described mAbs. As previously mentioned, we do not consider further studies aimed at therapeutic development and optimization of our mAbs to be justified given the current data. Therefore, we believe that the main message of the paper is adequately captured in the title.

      (4) This work could strongly benefit from two additional experiments:

      (a) In vivo experiments: experiments in animal models could offer a more comprehensive picture of the potential of the identified monoclonal antibodies. Additionally, this could help to answer a naïve question: why do the patients that have the antibodies still have chronic P. aeruginosa infections? 

      As explained above, the mAbs we isolated are significantly less potent than those described by Simonis et al., and are therefore unlikely to outperform the best anti-PcrV candidates in vivo. In light of the data, and considering ethical concerns related to animal use in research and budgetary constraints, we decided not to proceed with in vivo experiments.

      There are a number of reasons that may explain why patients with anti-PcrV Abs blocking the T3SS can still be chronically infected with Pa. First these Abs may be at limiting concentration, particularly in sites where Pa replicates, and thus unable to clear infection. in addition, it has been described that the T3SS is downregulated in chronic infection in cystic fibrosis patients. This suggests that a therapeutic intervention with T3SS inhibiting Abs may be more efficient if done early in cystic fibrosis patients to prevent colonization when Pa possesses an active T3SS. Finally, T3SS is not the only virulence mechanism employed by P. aeruginosa during infection. Indeed, multiple protein adhesins and polysaccharides are important factors facilitating the formation of bacterial biofilms that are crucial for establishing chronic persistent infection. In this regard, a combination of Abs targeting different factors on the P. aeruginosa surface may be needed to treat chronic infections.  

      (b) Multi-antibody T3SS assays (i.e., a combination of two or more monoclonal antibodies evaluated with the same assays used for characterization of single ones). This could explore the synergistic effects of combinatorial therapies that could address some of the limitations of individual antibodies. 

      Given the high potency of the Simonis mAbs and the mechanisms of action highlighted by our analysis, it is unlikely that our mAbs would synergize with those described by Simonis. Additionally, since our two mAbs cross-compete for binding, synergy between them is also improbable.

      Reviewer #1 (Recommendations for the authors):

      Line 166: How was the serum-IgG purified? (e.g., protein A, protein G). 

      Protein A purification was used, as now mentioned in the manuscript. Purified Igs were thus predominantly IgG1, IgG2 and IgG4, as indicated.

      (2) Line 196: When mentioning affinities, it is preferable to present in molar units. 

      To facilitate comparisons, Ab concentrations were presented in µg/mL as in Simonis et al.

      (3) Line 206: The author states that P3D6 displays significantly reduced ExoS-Bla injection (Figure 2B), but according to the presented table, ExoS-Bla inhibition was higher for P5B3. Additionally, when using "significantly", what was the statistical test that was used to evaluate the significance? Please clarify.

      We thank the reviewer for pointing out this inconsistency. Indeed, the names of P3D6 and P5B3 were exchanged when building the table related to Figure 2B. The corrected version of this figure is now presented in the new version of the manuscript. An ANOVA was performed to evaluate the significance of the observed difference (adjusted p-values < 0.001) and it is now mentioned in the figure caption.  

      (4) Line 215: "P3B3" typo.

      This was corrected.

      (5) Figure 3B: Could the author explain the higher level of ExoS-Bla injection when using VRCO1 antibody compared to no antibody.  

      A slightly higher level of the median is observed in the case of three variants out of five. However, this difference is not statistically significant (p-value > 0.05).

      (6) Supplement Figure 1: the presented grey area is not clear (is it the 95%CI?) and how was the IC50 calculated? With what model was it projected? Are the values for IC50 beyond the 100µg/mL mark a projection? It seems that projecting such greater values (such as the IC50 of over 400µg/mL for variant 5) is prone to high error probability.

      The grey area represents the 95% confidence interval (95% CI) and it is now mentioned in the figure caption. The IC50 and 95% CI were both inferred by the dose-response drc R package based on a three-parameters log-logistic model and it is now explained in the Materials & Methods section. The p-values for IC50 beyond the 100µg/mL were below 0.05 but we agree that such extrapolation should be considered with precaution (see below our response to comment number 7).

      (7) Line 227: The author describes that P5B3 has similar IC50 values towards variants 1-4, but the  IC50 towards variant 5 is substantially higher with 400µg/mL, albeit the only difference between variant 4 and 5 is the switch position 225 Arg -> Lys which are very similar in their properties. Please provide an explanation. 

      As explained in our response to comment number 6, we agree that the comparison of IC50 that are estimated to be close or higher than the highest experimental concentration is somehow speculative. Indeed, we performed further statistical analysis that showed no significant difference between the IC50 toward the five PcrV variants of mAb P5B3. In contrast, the difference between the IC50 of mAbs P5B3 and P3D6 toward variant 1 is statistically significant. This is now explained in the manuscript.

      (8) Line 233: Pore assembly: It is not clear how the data was normalized. The authors mention the methods normalization against the wildtype strain in the absence of antibodies, but did not elaborate clearly if the mutant strain has the same base cytotoxicity as the wild type. It would be helpful to show the level of cytotoxicity of the wild type compared to the mutant in the absence of antibodies to understand the baseline of cytotoxicity of both strains.  

      In these experiments we did not use the wild-type strain. As explained, the only strain that allows the measurement of pore formation by translocators PopB/PopD is the one lacking all effectors. All the experiments were done with this strain, and all the measurements were normalized accordingly. 

      (9) Figure 4: The explanation is redundant as it is clearly stated in the results. It would be better for the caption to describe the figure and leave interpretation to the results section. Overall, this comment is relevant to all figure captions, as it will reduce redundancy. My suggestion is to keep the figure caption as a road map to understand what is shown in the figure. For example, the Figure 4 caption should include that the concentration is presented in logarithmic scale, what is the dashed line, what is the grey area (what interval does it represent?), what each circle represents, and what is the regression model used? 

      Figure captions have been improved as suggested. 

      (10) Line 432: The authors apparently misquoted the original article describing the chimeric form PcrV* by describing the fusion of amino acids 1-17 and 136-249. I quote the original article by Tabor et al. "[...] we generated a truncated PcrV fragment (PcrVfrag) comprising PcrV amino acids 1-17 fused to amino acids 149-236 [...]". Additionally, how does the absence of amino acid 21 in the variant affect the conclusion? 

      Our construct was inspired by the one described in Tabor et al. but was not identical. We have therefore replaced "was constructed based on a construct by Tabor et al." for "whose design was inspired by the construct described in Tabor et al."

      Amino acid 21 is only absent in the construct used for crystallization experiments; all other experiments looking at Ab activity were performed with bacteria bearing full-length PcrV. The difference in P3D6 activity between variants V1 and V2-appears to be explained by the nature of the residue at position 225, according to the structural data, as explained now in more detail in the manuscript. Accordingly, the difference in efficiency of P3D6 against the V1 and V2  variants is explained by the residue at position 225, as both variants have the same residue at position 21. However, while the nature of the residue at position 225 appears to explain the absence of efficiency of the Ab for the variants studied, an impact of residue 21 could not be totally ruled out in putative variants with a Ser at 225 but different amino acids at 21.

      (11) Line 569: Missing word - ESRF stands for European Synchrotron Radiation Facility. 

      This has been corrected.

      (12) Line 268-269 (Figure 5A): The description of the alpha helices in relation to the figure is incomplete. Helices 2,3 and 5 are not indicated. 

      Indeed, since the structure is well-known and in the interest of visibility and simplicity, we only included the most relevant secondary structure features.

      (13) Line 271-272: It would be good to elaborate on the exact binding platform between LC and HC of the Fab and the residues on the PcrV side. For example, the author could apply the structure to PDBePISA (EMBL-EBI) which will provide details about the interface between the PcrV and the antibody. It is very interesting to learn what regions of the antibody are in charge of the binding, such as: is the H-CDR3 the major contributor of the binding or are other CDRs more involved? Additionally, in line 275 they state that the substitution of Ser 225 with Arg or Lys is consistent with the P3D6 insufficient binding. What contributed to this result on the antibodies side? 

      In order to address this question, we are now providing a LigPlot figure (supplementary Figure 3) in which specific interactions between PcrV* and the Fab are shown.

      (14) Line 291: It is unclear from what data the authors concluded that anti-PscF targets 3 distinct regions of PscF. 

      The data are shown in Supplementary Table 2, as mentioned in the manuscript. We have now modified the order of the anti-PcrV mAbs in the table to better illustrate the three identified epitope clusters (Sup table 2). Similarly, the anti-PscF mAbs appear to group into three clusters as P3G9 and P5E10 only compete with themselves, while mabs P3D6 and P5B3 compete with themselves and each other.

      (15) Line 315: It is preferable to introduce results in the results section instead of the discussion. 

      While preparing the manuscript, we initially included these results as a separate paragraph in the Results section, but ultimately chose the current format to improve flow and avoid redundancy.

      (16) Supplement Figure 2: What was the regression model used to evaluate IC50, and what is presented in the graph? What is the dashed line (see comment for Figure 4 above)? 

      The regression is based on a three-parameters log-logistic model and the light-colors area correspond to the 95% IC. The dashed lines visually represents 100% of ExoS-Bla injection. These information are now mentioned in the figure caption.

      (17) Figure 6B: It would be better to show an additional rotation of the PcrV bound by Fab 30-B8 that corresponds to the same as the one represented with Fab MEDI3092. This would clear up the differences in binding regions. Same for Fab P3D6. 

      Figure 6 already depicts two orientations. Despite the fact that we agree that additional orientations could be of interest, we believe that this would add unnecessary complexity to the figure, and would prefer to maintain the figure as is, if possible.

      (18) Line 356-358: The author proposes an experiment to support the suggested mechanism of P3D6, it would follow up with a bio-chemical analysis showing the prevention of PcrV oligomerization in its presence. 

      We understand the reviewers’ comment regarding the potential use of biochemical approaches to test our hypothesis. However, this not currently feasible as we have been unable to achieve in vitro oligomerization of PcrV alone, possibly due to the absence of other T3SS components, such as the polymerized PscF needle.

      (19) Line 456: Missing details about how the ELISA was conducted including temperature, how the antigen was absorbed, plate type, etc. 

      Experimental details have been added.

      (20) Line 460: Missing substrate used for alkaline phosphatase. 

      The nature of the substrate was added to the methods.

    1. eLife Assessment

      This study makes the valuable claim that people track, specifically, the elasticity of control (that is, the degree to which outcome depends on how many resources - such as money - are invested), and that control elasticity is impaired in certain types of psychopathologies. A novel task is introduced that provides solid evidence that this learning process occurs and that human behavior is sensitive to changes in the elasticity of control. Evidence that elasticity inference is distinct from more general learning mechanisms and is related to psychopathology remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the elasticity of controllability by developing a task that manipulates the probability of achieving a goal with a baseline investment (which they refer to as inelastic controllability) and the probability that additional investment would increase the probability of achieving a goal (which they refer to as elastic controllability). They found that a computational model representing the controllability and elasticity of the environment accounted better for the data than a model representing only the controllability. They also found that prior biases about the controllability and elasticity of the environment was associated with a composite psychopathology score. The authors conclude that elasticity inference and bias guide resource allocation.

      Strengths:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      Weaknesses:

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperforms theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings, and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This also pertains to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors test whether controllability beliefs and associated actions/resource allocation are modulated by things like time, effort, and monetary costs (what they call "elastic" as opposed to "inelastic" controllability). Using a novel behavioral task and computational modeling, they find that participants do indeed modulate their resources depending on whether they are in an "elastic," "inelastic," or "low controllability" environment. The authors also find evidence that psychopathology is related to specific biases in controllability.

      Strengths:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output, and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability.

      Weaknesses:

      The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations nor were there results of any regression analyses. That said, the authors did preregister the CCA analysis, so while perhaps not the best method, it was justified to complete it. Regardless of method, the psychopathology results are not particularly convincing, but provide an interesting jumping-off point for further exploration in future work.

    4. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way, and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a *very good idea*.

      The more concrete contributions, however, are not as strong. In particular, evidence for the paper's most striking claims is weak. Quoting the abstract, these claims are (1) "the elasticity of control [is] a distinct cognitive construct guiding adaptive behavior" and (2) "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control."

      Main issues

      I'll highlight the key points.

      - The task cannot distinguish elasticity inference from general learning processes

      - Participants were explicitly instructed about elasticity, with labeled examples

      - The psychopathology claims rely on an invalid interpretation of CCA, and are contradicted by simple correlations (elasticity bias and the sense of agency scale is r=0.03)

      Distinct construct

      Starting with claim 1, there are three subclaims here. (1A) People's behavior is sensitive to differences in elasticity; (1B) there are mental processes specific to elasticity inference, i.e., not falling out of general learning mechanisms; and, implicitly, (1C) people infer elasticity naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not well supported.

      (1B) The data cannot support the "distinct cognitive construct" claim because the task is too simple to dissociate elasticity inference from more general learning processes (also raised by Reviewer 1). The key behavioral signature for elasticity inference (vs. generic controllability inference) is the transfer across ticket numbers, illustrated in Fig 4. However, this pattern is also predicted by a standard Bayesian learner equipped with an intuitive causal model of the task. Each ticket gives you another chance to board and the agent infers the probability that each attempt succeeds. Crucially, this logic is not at all specific to elasticity or even control. An identical model could be applied to inferring the bias of a coin from observations of whether any of N tosses were heads-a task that is formally identical to this one (at least, the intuitive model of the task; see first minor comment).

      Importantly, this point cannot be addressed by showing that the author's model fits data better than this or any other specific Bayesian model. It is not a question of whether one particular updating rule explains data better than another. Rather, it is a question of whether the task can distinguish between biases in *elasticity* inference versus biases in probabilistic inference more generally. The present task cannot make this distinction because it does not make separate measurements of the two types of inference. To provide compelling evidence that elasticity inference is a "distinct cognitive construct", one would need to show that there are reliable individual differences in elasticity inference that generalize across contexts but do not generalize to computationally similar types of probabilistic inference (e.g. the coin flipping example).

      (1C) The implicit claim that people infer elasticity outside of the experimental task is undermined by the experimental design. The authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips."

      In the revisions, the authors seem to go back and forth on whether they are claiming that people infer elasticity without instruction (I won't quote it here). I'll just note that the examples they provide in the most recent rebuttal are all cases in which one never receives explicit labels about elasticity. If people only infer elasticity when it is explicitly labeled, I struggle to see its relevance for understanding human cognition and behavior.

      Psychopathology

      Finally, I turn to claim 2, that "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control." The CCA analysis is in principle unable to support this claim. As the authors correctly note in their latest rebuttal, the CCA does show that "there is a relationship between psychopathology traits and task parameters". The lesion analysis further shows that "elasticity bias specifically contributes to this relationship" (and similarly for the Sense of Agency scale). Crucially, however, this does *not* imply that there is a relationship between those two variables. The most direct test of that relationship is the simple correlation, which the authors report only in a supplemental figure: there is no relationship (r=0.03). Although it is of course possible that there is a relationship that is obscured by confounding variables, the paper provides no evidence-statistical or otherwise-that such a relationship exists.

      Minor comments

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, the researcher could infer "biases" in elasticity inference that are probably better characterized as effective use of prior information (encoded in the causal model).

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be over-extended given the nature of the data (solely behavioral) and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

    4. Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.  

      We appreciate the reviewer’s statement highlighting the importance of our study. 

      Strengths: 

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism. 

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses: 

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure). 

      We thank the reviewer for the helpful comments. We understand that the analyses were difficult to follow, and we will work on the clarity of the Results section. However, we would like to emphasize that every d′ measure is accompanied by analyses of response rates (i.e., correct and incorrect choice rates). In addition, we applied standard psychometric analyses whenever possible. Specifically, psychometric functions were fitted to the data using logistic regression. We will rework the text to clarify these points.

      During training, only two stimulus amplitudes were presented, which precluded the construction of psychometric curves. For the categorization task, however, psychometric analyses were feasible and conducted (Figure 2). These analyses revealed no evidence of categorization bias (as measured by threshold) or accuracy (as measured by the slope) across stimulus strengths.

      The calculation of d’ is included in the Methods, but we will also report and explain its use in each part of the Results section where it has been included.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task. 

      Strengths: 

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands. 

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses: 

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. 

      We thank the reviewer for the careful reading of our manuscript and for the constructive feedback. The reviewer raises a valid point. We agree that our study is primarily descriptive and focused on behavioral data, and we appreciate the opportunity to clarify the scope and interpretation of our findings. Our primary goal was to characterize behavioral patterns during tactile discrimination and categorization, and the psychometric analyses were intended to provide a detailed description of these patterns. We do not claim to provide direct neural, causal, or computational evidence. 

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. 

      Alternative explanations of our findings, such as differences in motivation, fatigue, satiety, stereotyped licking, and reward valuation have indeed been considered. We will revise the manuscript to present these points more clearly. 

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. We do not claim to have tested the Load Theory; rather, inspired by it, we assessed behavioral patterns in our tactile categorization task. We agree that referring to the Adaptive Resonance Theory, which is based on artificial neural network models, might be misleading since we focus on behavioral results, and we will revise the text accordingly. However, our task allowed us to examine the impact of categorization on discrimination, confirming that Fmr1<sup>-/y</sup>ation can amplify perceptual differences between stimuli belonging to different categories and reduce perceived differences within a category in WT mice but not in the mice when low-salience stimuli were experienced. Finally, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced use of categories in low-salience tactile discrimination. 

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations. 

      We agree with the reviewer that our current experiments are behavioral in nature and do not provide direct mechanistic evidence for top-down pathway dysfunction. Our goal was to carefully characterize tactile responses and behavioral patterns in Fmr1<sup>-/y</sup> mice. The notion of “top-down” is used at the behavioral level, referring to the influence of higher-level cognitive processes (e.g., categorization, attention) on perception, rather than to underlying neural circuits. We will revise the manuscript to more clearly emphasize that our conclusions are based on behavioral observations, and we will frame mechanistic inferences as hypotheses rather than established findings. We will also explicitly note that future work using neural recordings or causal manipulations will be required to directly test these hypotheses.

      We also note that identifying the precise top-down circuits involved will require extensive additional experimentation. For example, one would first need to pinpoint the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself. After such a circuit is identified, further work would then be needed to rescue or manipulate this pathway in the Fmr1<sup>-/y</sup> model. These steps represent a substantial program of mechanistic research that, while important, goes well beyond the scope of the present study.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited. 

      We recognize that “reduced top-down categorization influence” and “choice consistency bias” are based on behavioral observations. However, we respectfully disagree that this makes these constructs inherently speculative. Similar behavioral inferences have been applied in previous clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021). The translational impact of our work lies in the highly translational platform we have developed – and in highlighting the complexity of tactile measures and additional analyses that can be conducted in clinical studies.

      We agree with the reviewer that the neural-based experiments would indeed provide valuable mechanistic insight into our observed behavioral alterations, and we believe future studies should therefore focus on their underlying neurobiological substrate.

      We will revise the language throughout the manuscript to clarify that all conclusions are based on behavioral measures.  

      (3) Statistical analysis: 

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on nonsignificant findings undermines confidence in the conclusions.  

      Several trends are evident in complex measures, such as d’ analyses on task sensitivity or responses pooled across different amplitudes. Additional analyses revealed which component of these measures showed a statistically significant difference across genotypes, namely the low-salience incorrect choices accounting for low task sensitivity. We chose to present all analyses to be transparent and to highlight that commonly used complex measures (like d’ analyses) may mask important findings. In the text, we described p-values between 0.05 and 0.1 as observed trends without over-interpreting their significance. 

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations. 

      The number of mice used in each genotype group is consistent with standard practices in behavioral studies using mice and sensory tasks. We have performed effect size measures (e.g., Cohen’s d) alongside some of the statistical comparisons, showing a medium effect size (>0.5). 

      As the reviewer correctly noted, no mice were excluded based on outlier analyses, since the observed variability reflects true biological differences rather than experimental or technical errors. We will reexamine our dataset for potential outliers. If any are identified, we will perform analyses both with and without the outlier and report any effects that are sensitive to single animals. These procedures and results will be explicitly described in the Methods and Results sections.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.  

      We thank the reviewer for raising this important point and we will include a clear statement on multiple comparisons in the Methods section. 

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as ttests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test. 

      We thank the reviewer for raising this point. This was not done intentionally. A repeated-measures ANOVA on miss rates for low-salience stimuli during categorization confirmed that there are statistically significant differences both across stimulus amplitudes and between genotypes. Additional correction for multiple comparisons will be performed and explained in the Methods section.  

      (4) Emphasis on theoretical models: The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed. 

      As mentioned above, our goal was not to directly test these theories but rather to apply them within our translational framework. The Discussion section will be reframed to highlight that our findings are consistent with predictions from certain cognitive theories rather than implying that these frameworks were directly tested.

      Reviewer #3 (Public review): 

      Summary: 

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice 

      Strengths: 

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD. 

      We appreciate the reviewer’s positive assessment of our study’s translational value and the importance of our behavioral findings.

      Weaknesses: 

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.  

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, could provide additional insights into learning dynamics. This analysis will be conducted and added into the revised manuscript.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While it is an interesting and important question based on previous findings in preclinical and clinical studies, it falls outside the scope of the current manuscript.    

      Feigin H, Shalom-Sperber S, Zachor DA, Zaidel A (2021) Increased influence of prior choices on perceptual decisions in autism. Elife 10.

      Soulières I, Mottron L, Saumier D, Larochelle S (2007) At ypical categorical perception in autism: Autonomy of discrimination? J Autism Dev Disord 37:481–490.

    1. eLife Assessment

      This important study reports an endometrial organoid culture system mimicking the window of implantation. The evidence supporting the conclusion drawn is convincing. The data will be of interest to embryologists and investigators working on reproductive biology and medicine.

    2. Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

    3. Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only early-passage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

    4. Author response:

      The following is the authors’ response to the previous reviews

      We have thoroughly addressed all the reviewers’ comments and meticulously revised the manuscript. Key modifications include the following:

      (a) Organizing the Logic and Highlighting Key Findings: We have revised the manuscript to emphasize key findings (especially the distinctions between the SEC and WOI groups) according to the following logic: constructing a receptive endometrial organoid, comparing its molecular characteristics with those of the receptive endometrium, highlighting its main features (hormone response, enhanced energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition), and exploring the function involved in embryo interaction.

      (b) Clarity and Better Description of Bioinformatic Analyses: We have revised the sections involving bioinformatic analyses to provide a more streamlined and comprehensible explanation. Instead of overwhelming the reader with excessive details, we focused on the most important findings, and performed additional experimental validation.

      (c) Rationale for Gene Selection: We have clarified the rationale for selecting certain genes and pathways for inclusion in the analysis and manuscript. The associated gene expression data for all figures have been provided in the attached Dataset.

      (d) In the response letter, we have provided the detailed presentation of the methodological optimization for constructing this endometrial assembloids, along with optimization and comparison of endometrial organoid culture media. Furthermore, in the Limitations section, we have explicitly stated that stromal cells and immune cells gradually diminish with increasing passage numbers. Therefore, this study primarily utilized endometrial assembloids within the first three passages for all investigations.

      Below, we provide a point-by-point response to each comment, with all modifications highlighted in the revised manuscript. We respectfully hope that these revisions effectively address the concerns raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. The authors did their best to revise their study according to the reviewers' comments. However, the study remains unconvincing and at the same time too dense and not focused enough.

      (1) The use of the term organoids is still confusing and should be avoided. Organoids are epithelial tissue-resembling structures. Hence, the multiplecell aggregates developed here are rather "coculture models" (or "assembloids"). It is still unexpected (unlikely) that these structures containing epithelial, stromal and immune cells can be robustly passaged in the epithelial growth conditions used. All other research groups developing real organoids from endometrium have shown that only the epithelial compartment remains in culture at passaging (while the stromal compartment is lost). If authors keep to their idea, they should perform scRNA-seq on both early and late (passage 6-10) "organoids". And they should provide details of culturing/passaging/plating etc that are different with other groups and might explain why they keep stromal and immune cells in their culture for such a long time. In other words, they should then in detail compare their method to the standard method of all other researchers in the field, and show the differences in survival and growth of the stromal and immune cells.

      (1) We appreciate your feedback and have revised the term 'organoids' to 'assembloids'. 2)

      I. Due to budget constraints, this study did not perform scRNA-seq on both early and late passages (P6-P10). Instead, immunofluorescence staining confirmed the persistence of stromal cells at passage 6 (as shown below).

      Author response image 1.

      Whole-mount immunofluorescence showed that Vimentin+ F-actin+ cells (stromal cells) were arranged around the glandular spheres that were only F-actin+(passage 6).

      II. Improvements in this study include the following.

      a. Optimization of endometrial tissue processing: The procedures for tissue collection, pretreatment, digestion, and culture were refined to maximize the retention of endometrial epithelial cells, stromal cells, and immune cells (detailed optimizations are provided in Response Table 1).

      b. Enhanced culture medium formulation: Based on previous protocols, WNT3A was added to promote organoid development and differentiation (PMID: 27315476), while FGF2 was supplemented to improve stromal cell survival (PMID: 35224622) (see Response Table 2 for medium comparisons). Representative culture outcomes are shown in the figure below.

      We acknowledge that the stromal and immune cells in this system still exhibit differences compared to their in vivo counterparts. In this study, we utilized the first three passages, which offer optimal cell diversity and viability, to meet experimental needs. However, replicating and maintaining the full complexity of endometrial cell types in vitro remains a major challenge in the field—one that we are actively working to address.

      Author response table 1.

      Methodological Optimization of Endometrial Organoids (Construction, Passaging, and Cryopreservation)

      Author response table 2.

      Optimization and comparison of endometrial organoid culture media

      Author response image 2.

      Bright-field microscopy captures the expansion of glands and surrounding stromal cells across passages 0 to 2 (scar bar=200μm) (Yellow arrows: stromal cells; White arrows: glands).

      (2) The paper is still much too dense, touching upon all kind of conclusions from the manifold bioinformatic analyses. The latter should be much clearer and better described, and then some interesting findings (pathways/genes) should be highlighted without mentioning every single aspect that is observed. The paper needs a lot of editing to better focus and extract take-home messages, not bombing the reader with a mass of pathways, genes etc which makes the manuscript just not readable or 'digest-able'. There is no explanation whatever and no clear rationale why certain genes are included in a list while others are not. There is the impression that mass bioinformatics is applied without enough focus.

      Thanks for your suggestions. We have made improvements and revisions in the following areas:

      (a) Clarity and Better Description of Bioinformatic Analyses: We have revised the sections involving bioinformatic analyses to provide a more streamlined and comprehensible explanation. Instead of overwhelming the reader with excessive details, we focused on the most important findings.

      (b) Organizing the Logic and Highlighting Key Findings: We have revised the manuscript to emphasize key findings according to the following logic: constructing a receptive endometrial organoid, comparing its molecular characteristics with those of the receptive endometrium, highlighting its main features (hormone response, enhanced energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition), and exploring the function involved in embryo interaction.

      (c) Rationale for Gene Selection: We have clarified the rationale for selecting certain genes and pathways for inclusion in the analysis and manuscript.

      We hope these revisions address your concerns and improve the overall quality and clarity of the manuscript. Thank you once again for your valuable input.

      (3) The study is much too descriptive and does not show functional validation or exploration (except glycogen production). Some interesting findings extracted from the bioinformatics must be functionally tested.

      Thanks for your suggestions. We have restructured the logic and revised the manuscript, incorporating functional validation. The focus is on the following points: highlighting its main features (hormone response, enhanced energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition), and exploring the functions involved in embryo interaction.

      (4) In contrast to what was found in vivo (Wang et al. 2020), no abrupt change in gene expression pattern is mentioned here from the (early-) secretory to the WoI phase. Should be discussed. Although the bioinformatic analyses point into this direction, there are major concerns which must be solved before the study can provide the needed reliability and credibility for revision.

      To further investigate the abrupt change, the Mfuzz algorithm was utilized to analyze gene expression across the three groups, focusing on gene clusters that were progressively upregulated or downregulated. It was observed that mitochondrial and cilia-related genes exhibited the highest expression levels in WOI endometrial organoids, as well as cell junction and negative regulation of cell differentiation were downregulated (Figure 4A).

      (5) All data should be benchmarked to the Wang et al 2020 and Garcia-Alonso et al. 2021 papers reporting very detailed scRNA-seq data, and not only the Stephen R. Quake 2020 paper.

      We appreciate your suggestion. By integrating data from Garcia-Alonso et al. (2021) (shown in the figure below), we observed that both WOI organoids and SEC organoids exhibit increased glandular secretory epithelium and developed ciliated epithelium, mirroring features of mid-secretory endometrium. The findings exhibit parallels when contrasting these two papers.

      Author response image 3.

      UMAP visualization of integrated scRNA-seq data (our dataset and Garcia-Alonso et al. 2021) showing: (A) cell types, (B) WOI-org, (C)CTRL-org, (D)SEC-org versus published midsecretory samples.

      (6) Fig. 2B: Vimentin staining is not at all clear. F-actin could be used to show the typical morphology of the stromal cells?

      We appreciate your suggestion. We performed additional staining for F-actin based on Vimentin, and found that Vimentin+ F-actin+ cells (stromal cells) were arranged around the glandular spheres that were only F-actin+.

      (7) Where does the term "EMT-derived stromal cells" come from? On what basis has this term been coined?

      Within endometrial biology, stromal cells in the transition from epithelial to mesenchymal phenotype are specifically referred to as 'stromal EMT transition cells' (PMID: 39775038, PMID: 39968688).

      In certain cancers or fibrotic diseases, epithelial cells can transition into a mesenchymal phenotype, contributing to the stromal environment that supports tumor growth or tissue remodeling (PMID: 20572012).

      (8) CD44 is shown in Fig. 2D but the text mentions CD45 (line 159)?

      In Fig 2D, T cells are defined as a cluster of CD45+CD3+ cells, further subdivided into CD4+ and CD8+ T cells based on their expression of CD4 and CD8. This figure does not include data on CD44.

      (9) All quantification experiments (of stainings etc) should be in detail described how this was done. It looks very difficult (almost not feasible) when looking at the provided pictures to count the stained cells.

      a. Manual Measurement:

      For TEM-observed pinopodes, glycogen particles, microvilli, and cilia, manual region-of-interest (ROI) selection was performed using ImageJ software for quantitative analysis of counts, area, and length. Twenty randomly selected images per experimental group were analyzed for each morphological parameter.

      b. Automated Measurement:

      We quantified the fluorescence images using ImageJ. Firstly, preprocess them by adjusting brightness and contrast, and removing background noise with the “Subtract Background” feature.

      Secondly, set the threshold to highlight the cells, then select the regions of interest (ROI) using selection tools. Thirdly, as for counting the cells, navigate to Analyze > Analyze Particles. AS for measuring the influence intensity and area, set the “Measurement” options as mean gray value. Adjust parameters as needed, and view results in the “Results” window. Save the data for further analysis and ensure consistency throughout your measurements for reliable results.

      For 3D fluorescence quantification, ZEN software (Carl Zeiss) was exclusively used, with 11 images analyzed per experimental group. This part has been incorporated into “Supporting Information”

      Line 94-100.

      c. Normalization Method:

      For fluorescence quantification, DAPI was used as an internal reference for normalization, where both DAPI and target fluorescence channel intensities were quantified simultaneously. The normalized target signal intensity (target/DAPI ratio) was then compared across experimental groups. A minimum of 15 images were analyzed for each parameter per group. This part has been incorporated into “Supporting Information” Line 101-104.

      (10) Fig. 3C: it is unclear how quantification can be reliably done. Moreover, OLFM4 looks positive in all cells of Ctrl, but authors still see an increase?

      (a) Fluorescence images were quantitatively analyzed using ImageJ by measuring the mean gray values. For normalization, DAPI staining served as an internal reference, with simultaneous measurement of mean gray values in both the target fluorescence channel and the DAPI channel. The relative fluorescence intensity was then calculated as the ratio of target channel to DAPI signal for inter-group quantitative comparisons.

      (b) OLFM4 is an E2-responsive gene. Its expression in endometrial organoids of the CTRL group is physiologically normal (PMID: 31666317). However, its fluorescence intensity (quantified as mean gray value) was significantly stronger in both the SEC and WOI groups compared to the CTRL group (quantitative method as described above).

      (11) Fig. 3F: Met is downregulated which is not in accordance with the mentioned activation of the PI3K-AKT pathway.

      We appreciate your careful review. Our initial description was imprecise. In the revised manuscript, this statement has been removed entirely.

      (12) Lines 222-226: transcriptome and proteome differences are not significant; so, how meaningful are the results then? Then, it is very hard to conclude an evolution from secretory phase to WoI.

      We appreciate your feedback. The manuscript has been comprehensively revised, and the aforementioned content has been removed.

      (13) WoI organoids show an increased number of cilia. However, some literature shows the opposite, i.e. less ciliated cells in the endometrial lining at WoI (to keep the embryo in place). How to reconcile?

      Thank you for raising this question. We conducted a statistical analysis of the proportion of ciliated cells across endometrial phases.

      (a) Based on the 2020 study by Stephen R. Quake and Carlos Simon’s team published in Nature Medicine (PMID: 32929266), the mid-secretory phase (Days 19–23) exhibited a higher proportion of ciliated cells compared to the early-secretory (Days 15–18) and late-secretory phases (Days 24– 28) (Fig. R13 A).

      (b) According to the 2021 study by Roser Vento-Tormo’s team in Nature Genetics, ciliated cell abundance peaked in the early-to-mid-secretory endometrium across all phases (Fig. R13 B-C).

      Data were sourced from the Reproductive Cell Atlas.

      (14) How are pinopodes distinguished from microvilli? Moreover, Fig. 3 does not show the typical EM structure of cilia.

      Thank you for this insightful question.

      (a) Pinopodes are large, bulbous protrusions with a smooth apical membrane. Under transmission electron microscopy (TEM), it can be observed that the pinopodes contain various small particles, which are typically extracellular fluid and dissolved substances.

      Microvilli are elongated, finger-like projections that typically exhibit a uniform and orderly arrangement, forming a "brush border" structure. Under transmission electron microscopy, dense components of the cytoskeleton, such as microfilaments and microtubules, can be seen at the base of the microvilli.

      (b) You may refer to the ciliated TEM structure shown in the current manuscript's Fig. 2E (originally labeled as Fig. 2H in the draft). The cilium is composed of microtubules. The cross-section shows that the periphery of the cilium is surrounded by nine pairs of microtubules arranged in a ring. The longitudinal section shows that the cilium has a long cylindrical structure, with the two central microtubules being quite prominent and located at the center of the cilium.

      (15) There is a recently published paper demonstrating another model for implantation. This paper should be referenced as well (Shibata et al. Science Advances, 2024).

      Thanks for your valuable comments. We have cited this reference in the manuscript at Line 77-78.

      (16) Line 78: two groups were the first here (Turco and Borreto) and should both be mentioned.

      Thanks for your valuable comments. We have cited this reference in the manuscript at Line 74-76.

      (17) Line 554: "as an alternative platform" - alternative to what? Authors answer reviewers' comments by just changing one word, but this makes the text odd.

      Thank you for your review. Here, we propose that this WOI organoid serves as an alternative research platform for studying endometrial receptivity and maternal-fetal interactions, compared to current secretory-phase organoids. In the revised manuscript, we have supplemented the data by co-culturing this WOI organoid with blastoid, demonstrating its robust embryo implantation potential.

      Reviewer #2 (Public Review):

      In this research, Zhang et al. have pioneered the creation of an advanced organoid culture designed to emulate the intricate characteristics of endometrial tissue during the crucial Window of Implantation (WOI) phase. Their method involves the incorporation of three distinct hormones into the organoid culture, coupled with additives that replicate the dynamics of the menstrual cycle. Through a series of assays, they underscore the striking parallels between the endometrial tissue present during the WOI and their crafted organoids. Through a comparative analysis involving historical endometrial tissue data and control organoids, they establish a system that exhibits a capacity to simulate the intricate nuances of the WOI. The authors made a commendable effort to address the majority of the statements. Developing an endometrial organoid culture methodology that mimics the window of implantation is a game-changer for studying the implantation process. However, the authors should strive to enhance the results to demonstrate how different WOI organoids are from SEC organoids, ensuring whether they are worth using in implantation studies, or a proper demonstration using implantation experiments.

      Thank you for your valuable suggestions. The WOI organoids differ from SEC organoids in the following aspects.

      (1) Structurally, WOI endometrial organoids exhibit subcellular features characteristic of the implantation window: densely packed pinopodes on the luminal side of epithelial cells, abundant glycogen granules, elongated and tightly arranged microvilli, and increased cilia (Figure 2F).

      (2) At the molecular level, WOI organoids show enlarged and functionally active mitochondria, enhanced ciliary assembly and motility, and single-cell signatures resembling mid-secretory endometrium.

      Specifically, mitochondrial- and cilia-related genes/proteins are most highly expressed in WOI organoids (Figure 4A,B). TEM analysis revealed that WOI organoids have the largest average mitochondrial area (Figure 4C). Mitochondrial-related genes display an increasing trend across the three organoid groups, and WOI organoids produce more ATP and IL-8 (Figure 4D,E).

      For cilia, WOI organoids upregulate genes/proteins involved in ciliary assembly, basal bodies, and motile cilia, while downregulating non-motile cilia markers (Figure 5A-C).

      Single-cell analysis further confirms that WOI organoids recapitulate mid-secretory endometrial traits in mitochondrial metabolism and cell adhesion (Figure 2G).

      (3) Functionally, WOI organoids demonstrate superior embryo implantation potential. Given the scarcity and ethical constraints of human embryos, we used blastoids for implantation assays (Figure 6A). These blastoids successfully grew within endometrial organoids, established interactions (Figure 6B), and exhibited normal trilineage differentiation (epiblast: OCT4; hypoblast: GATA6; trophoblast: KRT18) (Figure 6C). WOI organoids achieved significantly higher blastoid survival (66% vs. 19% in CTRL and 28% in SEC) and interaction rates (90% vs. 47% in CTRL and 53% in SEC), confirming their robust embryo-receptive capacity (Figure 6D,E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In conclusion, it is needed to first meet all the concerns of the reviewers and then submit an appropriately adapted and comprehensive paper (also showing the robustness of the "organoids" and functionality of the findings) instead of this still fully descriptive paper. Further comments are included in the rebuttal document of the authors and will be provided by the editor as PDF.

      Reviewer #2 (Recommendations For The Authors):

      The authors made a good effort to reply all the statements. However, there are some points that the authors need to address.

      • There is an inconsistency in the manuscript regarding the number of passages in which the organoids are used; in the response to the reviewers, it mentions 5 passages, while in the Materials and Methods section, it states 3 passages.

      We sincerely appreciate your thorough review. In this study, organoids within the first three passages were used. To address the reviewer's question comprehensively, we have now provided a detailed account of the organoid passage history in our response.

      • We agree that the difference between SEC and WOI organoids may be subtle, but in response to this, the authors should explain what they mean by "the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids..."

      In the original manuscript, this statement indicated that, at the single-cell level, WOI endometrial organoids exhibited more functionally mature and thoroughly differentiated characteristics compared to SEC endometrial organoids (See details below).

      In the revised version, we have restructured this section to focus on following aspects: hormone response, energy metabolism, ciliary assembly and motility, epithelial-mesenchymal transition, and embryo implantation potential. Consequently, the "the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids..."has been removed.

      (1) Varied cellular functions:

      a. Secretory Epithelium: Compared to SEC organoids, WOI organoids exhibit enhanced peptide metabolism and mitochondrial energy metabolism in their secretory epithelium, supporting endometrial decidualization and embryo implantation (Figure 3F).

      b. Proliferative Epithelium: Compared to SEC organoids, WOI organoids demonstrate enhanced GTPase activity, angiogenesis, cytoskeletal assembly, cell differentiation, and RAS protein signaling in their proliferative epithelium (Figure S2G).

      c. Ciliated Epithelium: The ciliated epithelium of WOI endometrial organoids is associated with the regulation of vascular development and exhibits higher transcriptional activity compared to SEC organoids (Figure 5E).

      d. Stromal Cells: Compared to SEC organoids, WOI organoids exhibit enhanced cell junctions, cell migration, and cytoskeletal regulation in EMT-derived stromal cells (Figure S4A right panel). Similarly, cell junctions are also strengthened in stromal cells (Figure S4A left panel).

      (2) comprehensive differentiation:

      a. Compared to SEC organoids, WOI organoids exhibit more complete differentiation from proliferative epithelium to secretory epithelium (Figure 3G).

      b. The WOI organoids demonstrate more robust ciliary differentiation compared to SEC organoids (Figure 5F).

      c. The proliferative epithelium progressively differentiates into EMT-derived cells. Compared to SEC organoids, WOI organoids are predominantly localized at the terminal end of the differentiation trajectory, indicating more complete differentiation (Figure S4B).

      • What do the authors mean by "average intensity" when referring to the extra reagents added to the WOI? The results that the authors show in response to Reviewer 2's Q1 must be included as part of the results and explain how it was done in the materials and methods section.

      This parameter indicates the growth status of organoids. It measures the gray value of organoids through long-term live-cell tracking. When organoids undergo apoptosis, they progressively condense into denser solid spheres, leading to an increase in gray value (average intensity). This content has been incorporated into the Results section (Line 129) and is further explained in the Supporting Information "Materials and Methods" (Lines 70-77).

      • In panel 1C, it is not possible to see the stromal cells around because they are brightfield images.

      You are partly right. Bright-field images alone indeed make it difficult to distinguish stromal cells. However, by combining whole-mount immunofluorescence staining with the characteristic elongated spindle-shaped morphology of stromal cells, we were able to roughly determine their distribution in the bright-field images.

      • Responding to Reviewer 2's question Q7, the authors indicate how they establish the cluster. However, they do not specify whether they extrapolate the data from a database or create the cluster themselves based on the literature. It should be stated from which classification list (or classification database) the extrapolation has been made.

      Within endometrial biology, stromal cells in the transition from epithelial to mesenchymal phenotype are specifically referred to as 'stromal EMT transition cells' (PMID: 39775038, PMID: 39968688).

      In certain cancers or fibrotic diseases, epithelial cells can transition into a mesenchymal phenotype, contributing to the stromal environment that supports tumor growth or tissue remodeling (PMID: 20572012).

      • Regarding Reviewer 2's question Q8, if the authors have not been able to make comparisons with, at least, SEC organoids, unfortunately, the ERT loses much of its strength and should not serve as support.

      We agree with you at this point. These results have been moved to the supplementary figures.

      • If the differences in the transcriptome and proteome between SEC and WOI organoids are not significant, the result does not support the authors' model. If there are barely any differences at the proteome and transcriptome level between SEC and WOI organoids, why would anyone choose to use their model over SEC organoids?

      We sincerely appreciate your valuable feedback. In this revised manuscript, we have further integrated transcriptomic and proteomic analyses, revealing that WOI organoids exhibit enlarged and functionally active mitochondria, along with enhanced cilia assembly and motility compared to SEC organoids. Additionally, using a blastoid model, we demonstrated that WOI organoids possess superior embryo implantation potential, significantly outperforming SEC organoids. Our research group aims to develop an embryo co-culture model. Through systematic comparisons of structural, molecular, and co-culture characteristics between SEC and WOI organoids, we ultimately confirmed the superior performance of WOI organoids.

      • SEC and WOI organoids must be different enough to establish a new model, and the authors do not demonstrate that they are.

      Thank you for your valuable feedback. In the revised manuscript, we have emphasized the distinctions between SEC and WOI organoids in terms of structure, molecular characteristics, and functionality (co-culture with blastoid), as detailed below.

      (1) Structurally, WOI endometrial organoids exhibit subcellular features characteristic of the implantation window: densely packed pinopodes on the luminal side of epithelial cells, abundant glycogen granules, elongated and tightly arranged microvilli, and increased cilia (Figure 2F).

      (2) At the molecular level, WOI organoids show enlarged and functionally active mitochondria, enhanced ciliary assembly and motility, and single-cell signatures resembling mid-secretory endometrium.

      Specifically, mitochondrial- and cilia-related genes/proteins are most highly expressed in WOI organoids (Figure 4A,B). TEM analysis revealed that WOI organoids have the largest average mitochondrial area (Figure 4C). Mitochondrial-related genes display an increasing trend across the three organoid groups, and WOI organoids produce more ATP and IL-8 (Figure 4D,E).

      For cilia, WOI organoids upregulate genes/proteins involved in ciliary assembly, basal bodies, and motile cilia, while downregulating non-motile cilia markers (Figure 5A-C).

      Single-cell analysis further confirms that WOI organoids recapitulate mid-secretory endometrial traits in mitochondrial metabolism and cell adhesion (Figure 2G).

      (3) Functionally, WOI organoids demonstrate superior embryo implantation potential. Given the scarcity and ethical constraints of human embryos, we used blastoids for implantation assays (Figure 6A). These blastoids successfully grew within endometrial organoids, established interactions (Figure 6B), and exhibited normal trilineage differentiation (epiblast: OCT4; hypoblast: GATA6; trophoblast: KRT18) (Figure 6C). WOI organoids achieved significantly higher blastoid survival (66% vs. 19% in CTRL and 28% in SEC) and interaction rates (90% vs. 47% in CTRL and 53% in SEC), confirming their robust embryo-receptive capacity (Figure 6D,E).

      • Regarding Q16, Boretto et al. 2017 and Turco et al. 2017 also manage to isolate stromal cells, but they lose them between passages. It's not a matter of isolating them from the tissue or not, but rather how they justify their maintenance in culture. In the images added by the authors, it can be seen that the majority of stromal cells are lost from P0 to P1 after thawing. I still believe that the epithelial part can be passed and maintained, but the rest cannot, and that should be mentioned in the paper as a limitation. However, the authors can demonstrate the maintenance of stromal cells by performing immunostaining with vimentin from passages 4, 5, and 6.

      Thank you for your valuable comments. We have added the statement 'Stromal cells and immune cells are difficult to pass down stably and their proportion is lower than that in the in vivo endometrium' to the Limitations section (Line 364-365). Additionally, we performed immunostaining with vimentin starting from passage 6 and confirmed the presence of Vimentin+ F-actin+ stromal cells (as shown in Author response image 1).

    1. eLife Assessment

      This study presents a valuable finding on whether executive resources mediate the impact of language predictability in reading in the context of aging. The presentation of evidence is incomplete; further conceptual clarifications, methodological details, and addressing potential confounds would strengthen the study. The work will be of interest to cognitive neuroscientists working on reading, language comprehension, and executive control.

    2. Reviewer #1 (Public review):

      This manuscript reports a dual-task experiment intended to test whether language prediction relies on executive resources, using surprisal-based measures of predictability and an n-back task to manipulate cognitive load. While the study addresses a question under debate, the current design and modeling framework fall short of supporting the central claims. Key components of cognitive load, such as task switching, word prediction vs integration, are not adequately modeled. Moreover, the weak consistency in replication undermines the robustness of the reported findings. Below unpacks each point.

      Cognitive load is a broad term. In the present study, it can be at least decomposed into the following components:

      (1) Working memory (WM) load: news, color, and rank.

      (2) Task switching load: domain of attention (color vs semantics), sensorimotor rules (c/m vs space).

      (3) Word comprehension load (hypothesized against): prediction, integration.

      The components of task switching load should be directly included in the statistical models. Switching of sensorimotor rules may be captured by the "n-back reaction" (binary) predictor. However, the switching of attended domains and the interaction between domain switching and rule complexity (1-back or 2-back) were not included. The attention control experiment (1) avoided useful statistical variation from the Read Only task, and (2) did not address interactions. More fundamentally, task-switching components should be directly modeled in both performance and full RT models to minimize selection bias. This principle also applies to other confounding factors, such as education level. While missing these important predictors, the current models have an abundance of predictors that are not so well motivated (see later comments). In sum, with the current models, one cannot determine whether the reduced performance or prolonged RT was due to affecting word prediction load (if it exists) or merely affecting the task switching load.

      The entropy and surprisal need to be more clearly interpreted and modeled in the context of the word comprehension process. The entropy concerns the "prediction" part of the word comprehension (before seeing the next word), whereas surprisal concerns the "integration" part as a posterior. This interpretation is similar to the authors writing in the Introduction that "Graded language predictions necessitate the active generation of hypotheses on upcoming words as well as the integration of prediction errors to inform future predictions [1,5]." However, the Results of this study largely ignored entropy (treating it as a fixed effect) and only focus on surprisal without clear justification.

      In Table S3, with original and replicated model fitting results, the only consistent interaction is surprisal x age x cognitive load [2-back vs. Reading Only]. None of the two-way interactions can be replicated. This is puzzling and undermines the robustness of the main claims of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      This paper considers the effects of cognitive load (using an n-back task related to font color), predictability, and age on reading times in two experiments. There were main effects of all predictors, but more interesting effects of load and age on predictability. The effect of load is very interesting, but the manipulation of age is problematic, because we don't know what is predictable for different participants (in relation to their age). There are some theoretical concerns about prediction and predictability, and a need to address literature (reading time, visual world, ERP studies).

      Strengths/weaknesses

      It is important to be clear that predictability is not the same as prediction. A predictable word is processed faster than an unpredictable word (something that has been known since the 1970/80s), e.g., Rayner, Schwanenfluegel, etc. But this could be due to ease of integration. I think this issue can probably be dealt with by careful writing (see point on line 18 below). To be clear, I do not believe that the effects reported here are due to integration alone (i.e., that nothing happens before the target word), but the evidence for this claim must come from actual demonstrations of prediction.

      The effect of load on the effects of predictability is very interesting (and also, I note that the fairly novel way of assessing load is itself valuable). Assuming that the experiments do measure prediction, it suggests that they are not cost-free, as is sometimes assumed. I think the researchers need to look closely at the visual world literature, most particularly the work of Huettig. (There is an isolated reference to Ito et al., but this is one of a large and highly relevant set of papers.)

      There is a major concern about the effects of age. See the Results (161-5): this depends on what is meant by word predictability. It's correct if it means the predictability in the corpus. But it may or may not be correct if it refers to how predictable a word is to an individual participant. The texts are unlikely to be equally predictable to different participants, and in particular to younger vs. older participants, because of their different experiences. To put it informally, the newspaper articles may be more geared to the expectations of younger people. But there is also another problem: the LLM may have learned on the basis of language that has largely been produced by young people, and so its predictions are based on what young people are likely to say. Both of these possibilities strike me as extremely likely. So it may be that older adults are affected more by words that they find surprising, but it is also possible that the texts are not what they expect, or the LLM predictions from the text are not the ones that they would make. In sum, I am not convinced that the authors can say anything about the effects of age unless they can determine what is predictable for different ages of participants. I suspect that this failure to control is an endemic problem in the literature on aging and language processing and needs to be systematically addressed.

      Overall, I think the paper makes enough of a contribution with respect to load to be useful to the literature. But for discussion of age, we would need something like evidence of how younger and older adults would complete these texts (on a word-by-word basis) and that they were equally predictable for different ages. I assume there are ways to get LLMs to emulate different participant groups, but I doubt that we could be confident about their accuracy without a lot of testing. But without something like this, I think making claims about age would be quite misleading.

    4. Author response:

      Reviewer #1 (Public review):

      Cognitive Load and Task-Switching Components:

      We agree that cognitive load is multi-faceted and encompasses dimensions not fully captured in our present models, including domain and rule switching. For the revision, we will explicitly model these components in the statistical analyses by incorporating predictors reflecting attended domain switching and rule complexity, as suggested. We will also explain our inclusion of n-back reaction predictors and justify their relationship with theoretical constructs of executive function. Full details of coding schemes will be provided.

      Modeling Entropy and Surprisal:

      We appreciate the reviewer’s suggestion to further explain the distinction between entropy (predictive uncertainty) and surprisal (integration difficulty), and acknowledge that our treatment of entropy warrants extension. In the revision, we will expand the results and discussion on entropy, providing clearer theoretical motivation for its inclusion and conducting supplementary analyses to examine its role alongside surprisal.

      Replicability of Findings:

      We note the concern regarding two-way vs. three-way interactions in model replication. In the revised manuscript, we will report robustness analyses on subsets of our data (e.g., matched age and education groups), clarify degrees of freedom and group sizes, and transparently report any discrepancies.

      Predictors and Statistical Modeling:

      We will add clarifications on predictor selection, data structure, and rationale for model hierarchy. The functions of d-prime, comprehension accuracy, and performance modeling will be described in more detail, including discussion of block-level vs. participant-level effects.

      Reviewer #2 (Public review):

      Distinction Between Prediction and Predictability:

      We recognize the importance of clearly communicating the difference between prediction and predictability, as well as integration-based vs. prediction-based effects. We will clarify these distinctions throughout the introduction, methods, and discussion sections, citing the relevant theoretical literature (e.g., Pickering & Gambi 2018; Federmeier 2007; Staub 2015; Frisson 2017).

      Aging, Corpus Predictability, and Individual Differences:

      We appreciate the critical point regarding age, corpus-based predictability, and potential cohort effects in language model estimates. In the revision, we will provide conceptual clarifications on how surprisal and entropy might differ for different age groups and discuss limitations in extrapolating these metrics to participant-specific predictions. The limitations inherent in relying on LLM-derived estimates and text materials will be more directly addressed.

      Coverage of Literature and Paradigms:

      We will broaden the literature review as requested, particularly on the N400 effects and behavioral traditions in prediction research. These additions should help contextualize the present work within both neuroscience and psycholinguistics.

      Experimental Context and Predictability Metrics:

      We will address concerns regarding the context window for prediction estimation, describing more precisely how context was defined and whether broader textual cues may improve predictability metrics.

      References

      Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychol. Bull., 144(10), 1002–1044.

      Federmeier, K.D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505.

      Frisson, S. (2017). Can prediction explain the lexical processing advantage for short words? J. Mem. Lang., 95, 121–138.\

      Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Lang. Linguist. Compass, 9(8), 311–327.Huettig, F. & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Trends Cogn. Sci., 20(10), 484–492.We appreciate the reviewers’ constructive comments and believe their suggestions will meaningfully strengthen the paper. Our planned revisions will address each of the above points with additional analyses, clarifications, and expanded discussion.

    1. eLife Assessment

      This study used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and showed that its deletion has no effect on retinal neurogenesis or cell fate specification, thereby challenging the prevailing view of Ptbp1 as a master regulator of neuronal fate. The findings are convincing, supported by transcriptome analysis, histology, and proliferation assays. This study is important, though the genetic tools employed may not fully capture Ptbp1's potential role during the earliest stages of retinal development.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research.

      Strengths:

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is well-organized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications.

      Weaknesses:

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. The claim that Ptbp1 is "fully dispensable" for retinal development may be toned down, given the transcriptional and splicing modifications identified. The possibility of subtle or transitory impacts, such as ectopic neuron development followed by cell death, is postulated, but not completely investigated. Furthermore, as the authors point out, the compensating potential of increased Ptbp2 warrants additional exploration. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function. While 864 splicing events have been found, the functional significance of these alterations, notably the 7% that are neuronal-enriched and the 35% that are rod-specific, has not been thoroughly investigated. The manuscript might be improved by describing how these splicing changes affect retinal development or function.

    3. Reviewer #2 (Public review):

      Summary:

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development.

      Strengths:

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS.

      Weaknesses:

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.

    1. eLife Assessment

      This valuable study presents a mechanistic model of predictive coding by medial entorhinal cortex grid cells, implemented with biologically detailed conductance-based neurons. The evidence supporting the emergence of this coding scheme from specific membrane currents and the anatomical connectivity among inhibitory neurons is solid. However, the justification for the choice of connectivity patterns and other network parameters remains somewhat incomplete. This work will be of interest to neuroscientists working on spatial navigation, circuit dynamics, and neuronal coding.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aim to elucidate the mechanisms by which grid cells in the medial entorhinal cortex generate predictive representations of spatial location. To address this, they built a computational model integrating intrinsic neuronal dynamics with structured network connectivity. Specifically, they combine a conductance-based single-cell model incorporating biologically realistic HCN channels with a continuous attractor network that reflects known properties of grid cell circuitry. Their simulations show that HCN conductance can shift grid fields forward by approximately 5% of their diameter, consistent with experimental observations in layer II grid cells. Additionally, by introducing asymmetry in the connectivity of interneurons, the model produces larger forward shifts, which parallel properties observed in layer III grid cells. Together, these two mechanisms provide a unified framework for explaining layer-specific predictive coding in the entorhinal cortex.

      Strengths:

      A major strength of the study lies in its conceptual contribution. The authors propose two distinct mechanisms to generate forward-shifted grid fields for predictive coding. One mechanism is intrinsic and depends on the time constants associated with HCN channels. The other is network-based and results from asymmetries in interneuron connectivity. These two mechanisms correspond to different observed properties of grid cells in layer II and layer III, respectively. The modeling is based on previously validated frameworks of continuous attractor network models (e.g., Burak & Fiete; Kang & DeWeese), but it incorporates several novel features, including the incorporation of biophysically realistic HCN channels, a network architecture that excludes stellate-stellate connections and relies on interneurons, and asymmetric interneuron connectivity.

      Weaknesses:

      One of the proposed mechanisms for predictive coding, namely asymmetric interneuron connectivity, is a novel idea. However, this type of connectivity has not yet been demonstrated experimentally in the medial entorhinal cortex. Therefore, the biological plausibility of this mechanism remains uncertain and will need to be evaluated in future empirical studies.

    3. Reviewer #2 (Public review):

      Summary:

      This study proposes that predictive spatial representations in medial entorhinal cortex (MEC) grid cells arise through two distinct biophysical mechanisms: (1) HCN conductance-dependent temporal dynamics, which generate modest forward shifts (~5% of grid field diameter) in Layer II cells, and (2) network asymmetry, enabling larger predictive shifts (~25% of grid field diameter) in Layer III cells. The model further predicts a dorsoventral gradient in predictive coding magnitude, correlating with observed HCN conductance variations. These results provide a mechanistic framework for understanding how intrinsic cellular properties and circuit architecture collectively enable prospective spatial coding in the MEC. This is an important study.

      Strengths:

      These findings reveal how cellular properties and circuit design enable prospective spatial coding. This novel, impactful study will be of interest to the field.

      Weaknesses:

      Some of the models are too mathematical and do not fit with the biological observation.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shaikh and Assisi addresses a timely and important question related to the neural circuit mechanisms underlying spatial representations during navigation. Concretely, they present a model of the medial entorhinal cortex (MEC) with biophysically detailed conductance-based stellate cells that can perform path integration and reveal two potential mechanisms underlying two forms of predictive coding by grid cells in the MEC. One mechanism uses HCN channels to explain predictive coding in MEC layer II grid cells equivalent to ~5% of the diameter of a grid field, and the other uses asymmetric connections between interneurons and stellate cells, resulting in a ~25% predictive bias of layer III grid cells. The methods and model are technically sound, and the model is expected to be useful for computational neuroscientists studying the neural mechanisms of spatial navigation.

      Strengths:

      One strength of the model is its use of conductance-based neuron models of stellate cells and interneurons, adding important biophysical constraints and details to existing continuous attractor network models of grid cells. The model fills a gap in the literature by providing mechanisms for predictive coding constrained by biophysical properties of stellate cells and simplified network topology.

      Weaknesses:

      A weakness of the model is that the neural network is relatively small (five sheets with 71 × 71 neurons each), and the 2-D toroidal topology is further simplified to a 1-D ring attractor consisting of three rings with 192 neurons each. The model incorporates biophysical detail at the single-neuron level, but not at the network level. For example, it includes only stellate cells and a generic interneuron type, and does not implement data-driven connectivity patterns.

      The restricted network size and the limited experimental knowledge about connectivity among stellate cells, principal cells, and different interneuron types in the MEC could be addressed in more detail. Moreover, the manuscript lacks a thorough discussion of assumptions common to most continuous attractor network (CAN) models of grid cells, such as the use of "hand-crafted" connections between direction-sensitive conjunctive grid cells and network cells to drive attractor shifts. Including such a discussion would strengthen the manuscript. This is especially relevant given the authors' explicit claim that they have revealed two mechanisms underlying the emergence of a predictive code in the MEC. In this reviewer's view, the work demonstrates a potential mechanism, but one that requires experimental verification. The significance of the model would thus be increased by providing more experimentally testable predictions of the model.

    1. eLife Assessment

      This fundamental study shows how past experiences shape perception across short, medium, and long time scales, using a single behavioural paradigm and reanalysed EEG data. It provides convincing evidence for two processes across all scales: an attention-dependent mechanism that speeds responses to expected events, and an attention-independent mechanism where expected events are encoded less precisely, consistent with feedforward dampening. The work offers a unifying account of temporal context effects, though stronger brain-behaviour links, integration with serial dependence attraction and repulsion models, and extension to other timescale definitions would further strengthen the contribution.

    2. Reviewer #1 (Public review):

      Summary:

      This paper addresses an important and topical issue: how temporal context - at various time scales - affects various psychophysical measures, including reaction times, accuracy and localization. It offers interesting insights, with separate mechanisms for different phenomena, which are well discussed.

      Strengths:

      The paradigm used is original and effective. The analyses are rigorous.

      Comments on revised version:

      I think the authors have dealt adequately with my issues, none of which were fundamental.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

      Strengths:

      The authors have done a good job in their revision, clarifying important points and stating the limitations of the study clearly.

      I also think they made a valid effort to address and correct issues arising from the temporal dependency confound, although I still wonder whether the best approach would have been to design an experiment in a way that avoided this confound in the first place.<br /> Overall, this is a substantially improved version, and I particularly appreciate the clarification and correction regarding the direction of the bias in the EEG data (repulsive rather than attractive).

      Weaknesses:

      These are now relatively minor points.

      I believe this latter aspect, the repulsive bias, may deserve further discussion, especially in relation to their behavioral findings and, in particular, to earlier work proposing multi-stage frameworks of serial dependence, where low-level repulsion interacts with attractive biases at higher-level stages (Fritsche et al., 2020; Pascucci et al., 2019; Sheehan & Serences, 2022). The authors may also consider to cite some key reviews on serial dependence that discuss both repulsion and attraction in forced-choice and reproduction tasks (Manassi et al., 2023; Pascucci et al., 2023).

      Related to this, after finding the opposite pattern, is the sentence in line 472-473 ("Further, we found an attractive...") and the related argument still valid?

      Regarding my earlier point about former line 197 and Figure 3b,c: what I noticed-similar to the patterns reported in the studies I referenced-is that the data cannot be simply described as showing faster and more accurate responses for small deltas. Responses also appear faster and more accurate for very large deltas, with performance being worse in between. Indeed, as the authors state: "The peak in precision for large Deltas locations is consistent with alternate events being encoded more precisely, while the peak for small offsets may be explained by the attractive bias towards the previous target." I wonder whether it is necessary, or unequivocally supported by the data, to hypothesize two separate mechanisms here. An alternative could be interference effects between consecutive stimuli that are neither identical nor completely different-making the previous one more likely to interfere with the current stimulus representation.

      Finally, this is definitely a minor point, but I still find the reply to my comment about the prediction of stable retinal input rather speculative. Such a prediction would seem more plausible in world-centered coordinates.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above). 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participants’ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90°| relative to its location on the previous trial, and an alternate if it moves by >|90°|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both. 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We don’t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a “Limitations and future directions” section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term “accuracy” to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to “task accuracy” throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participant’s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision. 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing. 

      As explained in our response to Reviewer 1’s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported. 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled – the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript. 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decoding  (g) accuracy, (h) precision, and (h) bias from 50 – 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ±SEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20–22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classification  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classification  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ±SEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180° to 180°, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ±SEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20–22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020). 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022). 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior. 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects. 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition: 

      “The increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.” We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedman’s ANONA tests, Kendal’s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022). 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming. 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth. 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications. 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms. 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to “awareness” without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing. 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think it’s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable. 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References: 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solé, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739. 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595. 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24. 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711. 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).  Psychological  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68. 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

    1. eLife Assessment

      Complementing previous work (Namiki et al, 2018), this study provides an important resource for the Drosophila community as it reports 500 lines targeting descending neurons (DN), in addition to compiling 306 existing DN lines from the literature. The compelling work characterizes 146 DNs and makes a critical link with the DNs identified in Electron microscopy (EM). The lines in this paper will be of interest to Drosophila neuroscientists who will be able to use the reported genetic drivers for further functional characterization of DNs and circuit mapping in conjunction with existing EM datasets.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Zung et al. describes a curated library of genetic lines labeling a class of important neurons called Descending Neurons in the fruit fly, Drosophila melanogaster. These neurons are especially important in their critical role in relaying information from the brain to motor circuits within the ventral nerve cord - the insect analogy of the vertebrate spinal cord. The authors screened through a vast resource of Gal4 lines to generate 500 new genetic lines that allow for the precise labeling of 190 (40%) of all Descending Neurons. The tools introduced here will allow researchers to perform precise circuit dissection of the exact roles these neurons play in linking the brain to the ventral nerve cord.

      Strengths:

      This manuscript represents an important follow-up to the author's 2018 paper in the extension of the genetic toolkit from 178 genetic lines that target 65 Descending Neuron (DN) classes to 806 lines that target 190 DN classes. The presentation of this toolkit is comprehensive with confocal images, informative classifications of lines based on specificity/consistency, and identification of the neuron types - when possible - in the EM dataset.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Descending neurons (DNs) are critical nodes in the neural computation underlying sensorimotor transformation. Building on their earlier work, the authors have substantially expanded the genetic resources for labeling these cell types in D. melanogaster, offering a valuable public resource.

      Strengths:

      The authors identified 146 additional DN types and generated 500 new DN driver lines, expanding the genetic reagents from labeling 98 cell types to 244, representing approximately 50% of all DN types estimated by EM connectomes. While the EM connectomes offer unprecedented resolution of neuronal cell types and their connectivity, genetic access to these cell types remains essential for studying their functions and testing hypotheses. Given the broad interest in DNs, the reagents generated in this study will be of important value for addressing a wide range of questions in sensorimotor transformation.

      The organization of the dataset is overall intuitive and comprehensive. The authors also provided clear information and guidance on accessing the relevant resources, such as stack images and fly lines. In addition, the authors have thoughtfully handled the information updated from the earlier collection they generated (Namiki et al. 2018) and incorporated previously published DN lines, providing a consolidated and up-to-date resource for the DN community.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides the Drosophila community with a large collection of new split-Gal4 descending neuron genetic lines. They extend previous efforts to characterize and identify genetic lines for this important class of neurons by providing images of descending neurons and a metric for genetic lines based on specificity and consistency. Their discussion highlights several applications of this collection, for example, to understand the function of new descending neurons through optogenetic and/or physiological characterization. They also helpfully discuss caveats, encouraging users of this collection to validate expression patterns and to be careful when interpreting optogenetic experimental results, considering potential off-target labeling in the lines. Overall, members of the Drosophila community interested in understanding the function of descending neurons and their role in behavior will find this a helpful resource.

      Strengths:

      (1) The authors extend the previous genetic access of descending neurons in Drosophila to over 800 split-Gal4 lines and 190 cell types (nearly half of the known population of descending neurons). The authors update and at times correct the previous identification of descending neurons from a previous, large-scale analysis. The authors extend and, at times, correct previous efforts at characterizing these neurons.

      (2) Clear images of descending neurons labeled by new genetic lines are presented in the main figure papers for reference.

      (3) This study classifies lines labeling descending neurons using a quality score to indicate specificity and consistency. They provide this for the entire set of genetic lines, a valuable assessment for researchers interested in targeting these neurons for optogenetic or physiological characterization.

      Weaknesses:

      Although this paper represents a substantial effort and useful contribution to the Drosophila community, a few weaknesses, primarily regarding the specificity and reliability of genetic lines, remain:

      (1) The authors state that optogenetic activation of DN types using the new split-GAL4 lines is expected to reliably activate the target neurons with virtually no off-target effects in the rest of the central nervous system. More data supporting this conclusion, including both qualitative and quantitative anatomical evidence, would strengthen this claim.

      (2) The authors do recommend that researchers using these lines examine expression patterns themselves to evaluate line cleanliness and consistency, but some analysis by the authors would be useful, for example, providing guidelines for best practices to perform this evaluation.

      (3) Changes in expression patterns after several generations are noted by the authors, weakening confidence somewhat in the long-term usefulness of this collection of genetic lines.

    1. eLife Assessment

      This important study presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. Though minor gaps in understanding the compound's precise molecular mechanism of action and its ability to target Mac1 from other coronaviruses remain, the evidence for its effects on SARS-CoV-2 in relevant biological models is compelling.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently the authors present their finding of the efficacy of the inhibitor both on cell culture as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings. In addition, the authors address several aspects of the health impact of coronaviral infections from animal survival, over viral load to histological assessment of lung damage.

      Weaknesses:

      The selection of Targ1 and MacroD2 as off-target human macrodomains is sub-optimal as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity. However, the authors address this issue by providing modeling data that show clashes with AVI-4206 similarly to their models with MacroD2 and TARG1.

      Comments on revisions:

      While the authors have not addressed all my suggestions experimentally, I would like to nevertheless congratulate them on a significantly strengthened manuscript that will provide a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have the reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects. The authors acknowledge this and point it out as a potential avenue for further optimization.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

      Comments on revisions:

      All previous suggestions were addressed. I am satisfied with the author's modifications.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      Although this study is rigorous and the paper is well-written, I have a few concerns that the authors should address before publication.

      (1) Cellular levels of protein ADP-ribosylation should be analyzed using anti-ADPR antibodies following infection, both with and without Mac1 and AVI-4206 treatment. While the authors have provided impressive in vivo data, these experiments could ideally be conducted in mice. However, I would be amenable to these analyses being performed in human airway organoids, as they demonstrate clear phenotypes following AVI-4206 treatment post-infection. For a more in-depth exploration, the authors could consider affinity purifying ADP-ribosylated proteins and identifying them via mass spectrometry. I would find it particularly compelling if this approach revealed components of the NF-kB signaling pathway, given the intriguing results presented in Fig. 5. I am also curious if there are differences in ADP ribosylated proteins when comparing Mac1 KO SARS-C0V-2 to AVI-4206 treatment.

      We note that despite the recent flurry of activity around Mac1, there is a surprising lack of public data on overall ADPr levels or targets. While we will address the literature precedence for PARP14 signals specifically below (Reviewer 2 point (h)) by immunofluorescence, we note that overall levels have not been characterized biochemically previously. Recent PARP14 papers and the ASAP AViDD preprint show changes by immunofluorescence only: and the evidence in that preprint is quite modest - see Figure 7B - https://pmc.ncbi.nlm.nih.gov/articles/PMC11370477/.

      We suspect the difficulty in tracking changes biochemically is due to multiple factors that influence the overall detectability and reproducibility. First, with regard to detectability - it is quite possible that only a small change in the ADPr status of a small number of targets is responsible for the phenotypes in vivo. Virus levels are very low in the organoid system and the variability in ADPr levels from tissue samples from in vivo experiments is high. Given the difficulty in translating back to cellular models, this problem is therefore magnified further. Second, with regard to reproducibility - we observe a great deal of reagent dependence on ADPr signals by Western blot+/- Mac1 expression in both cellular and tissue lysates (including when stimulated with H2O2, interferon, or during viral infection). Similarly, we do not observe reproducible proteins that pulldown with Mac1 when assayed by mass spectrometry. It is quite likely that these issues are a result of tissue/sample preparation that results in a loss of the ADPr modification during preparation (especially for acidic residue modifications). This also explains the reliance on IF assays in the PARP14 literature. A very good discussion of these issues is also contained in this paper: https://doi.org/10.1042/BSR20240986.

      Nonetheless we have attempted one final experiment. Here, we have measured ADPr modification of cellular lysates upon uninfected conditions as well as upon infection with either WT or N40D mutant virus. For all conditions, this was done with or without treatment of cells with 100 μM of AVI-4206. Measurement of ADPr modifications by western blot using a  pan-ADPr antibody revealed a single prominent band with a molecular weight of ~130kDa, that showed a uniform increase in signal upon treatment of cells with AVI-4206 regardless of infection status. While this general trend was also observed with the mono-ADPr antibody, it was not statistically significant in its regulation upon AVI-4206 treatment. We suspect that the major band observed in these western blots is PARP1, as upon enrichment of ADPr proteins from these lysates by Af1521 immunoprecipitation, we find PARP1 to be among the most abundant proteins detected within this molecular weight range. We note that there is a baseline increase in polyADPr detection upon infection of virus with WT Mac1 (relative to uninfected and virus with N40D) and further increase when treated with AVI-4206. This compound-dependent increase is paralleled in the uninfected and N40D conditions. The counterintuitive increase upon WT Mac1 virus infection, which should erase ADPr marks, and the compound-dependent increase in the uninfected condition suggest that there are many indirect effects on ADPr signalling dynamics in this experiment. These results are difficult to reconcile with the specificity profiling of AVI-4206 (Supplementary Figure5: Thermal proteome profiling in A549 cellular lysates). As mentioned above, the lack of consistent signal across reagents for ADPr detection and the timing of monitoring ADPr levels are additional complicating factors.

      We added to the results:

      “However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8).”

      Methods for experiment:

      Calu3 cells were obtained from ATCC and cultured in Advanced DMEM (Gibco) supplemented with 2.5% FBS, 1x GlutaMax, and 1x Penicillin-Streptomycin at 37°C and 5% CO<sub>2</sub>. 5x10<sup>6</sup> cells were plated in 15-cm dishes and media was changed every 2-3 days until the cells were 80% confluent. The cells were treated with INFy 50 ng/mL (R&D Systems) w/without AVI-4206 100 μM. After 6 hours, the cells were infected with WA1 or WA1 NSP3 Mac1 N40D at a multiplicity of infection (MOI) of 1 for 36 hours. The cells were washed with PBS x 3 and scraped in Pierce IP Lysis Buffer (ThermoFisher) containing 1x HALT protease and phosphatase inhibitor mix (ThermoFisher) on ice. The lysate was stored at -80C until further processing.

      The cell lysate was incubated for 5 minutes at room temperature with recombinant benzonase. Following incubation, the lysate was centrifuged at 13,000 rpm at 4°C for 20 minutes, and the supernatant was collected. The samples were then boiled for 5 minutes at 95°C in 1x NuPAGE LDS sample buffer (Invitrogen) with a final concentration of 1X NuPAGE sample reducing agent (Invitrogen). For the detection of ADPr levels in whole-cell lysates, the samples were subjected to SDS-PAGE and Immunoblotting. All primary and secondary antibodies (pan-ADP-ribose antibody (MABE1016, Millipore), Mono-ADP-ribose antibody (AbD33204, Bio-Rad), HRP-conjugated (Cell signaling), used at a 1:1000 dilution were diluted in 5% non-fat dry milk in TBST. Signals were detected by chemiluminescence (Thermo) and visualized using the ChemiDoc XRS+ System (Bio-Rad). Densitometric analysis was performed using Image Lab (Bio-Rad). Quantification was normalized to Actin. The data are expressed as mean ± SD. Statistical differences were determined using an unpaired t-test in GraphPad Prism 10.3.1.

      (2) SARS-CoV-2 escape mutants for AVI-4206 should be generated, sequenced, and evaluated for both ADP-ribosyl hydrolase activity and their susceptibility to inhibition by AVI-4206.

      We thank the reviewer for this suggestion. These are indeed key experiments which are currently hampered by the lack of a cell line that is fully responsive to drug treatment. Although infected organoids and macrophages show an effect in response to AVI-4206, viral levels are ~3 logs lower than in cell lines and difficult to sequence. In the absence of a system that would allow meaningful screening for outgrowth of resistant viruses, we have conducted mass spectrometry studies that showed that Mac1 is the only significant hit for AVI-4206 (SupplementaryFigure 5). The suggested outgrowth experiments will be conducted once a responsive cell line model has been established.

      (3) Given that Mac1 is found in several coronaviruses, it would be insightful for the authors to test a selection of Mac1 homologs from divergent coronaviruses to assess whether AVI-4206 can inhibit their activity in vitro.

      As mentioned above, inconsistencies in ADPr staining limit our ability to directly measure cellular activity. As an alternative approach to measure AVI-4206 selectivity in cells, we have adapted our CETSA assay for SARS-1 and MERs macrodomain proteins and find evidence that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1. In line with MERS being more structurally divergent than SARS-1 from SARS CoV2, the ΔTagg for SARS-1 and MERS are 4℃ and 1℃, respectively, compared to 9℃ for Mac1.  These data have been added as Supplementary Fig S3C. Development of broader spectrum pan-inhibitors is on our radar for future work which will more thoroughly assess homologs from divergent coronaviruses.

      We added the following sentence to the main results:

      “Encouragingly, we were also able to adapt our CETSA assay for SARS-1 and MERs macrodomain proteins and find that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1 (Supplementary Figure 3C).”

      We also added this supplementary figure 3:

      Minor

      (1) Line 88, "respectively.heir potency"

      Fixed, thank you!

      (2) Line 149 add a period after proteome

      Fixed, thank you!

      Reviewer #2 (Recommendations for the authors):

      (a) The authors assess inhibition of MacroD2 and Targ1 as of-targets for AVI-4206. However, Mac1 belongs to the MacroD-type class of macrodomains of which MacroD1, MacroD2 and MOD1s of PARP9 and PARP14 are the human members. In contrast Targ1 belongs to the ALC1-like class, which is only very distantly related to Mac1. Furthermore, recent studies have shown that the first macrodomains of PARP9 and PARP4 (MOD1 of PARP9/14) are much closer related to Mac1 and PARP9/14 were implicated in antiviral immunity. As such the authors should include assays showing the activity of their compounds against MacroD1 and MOD1s of PARP9/14.

      We emphasize that we detect no significant shift for any protein other than Mac1 in A549 cells by CETSA-MS (Supplementary Figure 6). For Mac1 CESTA, we see an average of 6 PARP14 spectral counts across conditions and did not detect PARP9.  In addition, for separate work in MPro, we ran similar CETSA experiments where we observed an average of 2 PARP9 and 15 PARP14 spectral counts across conditions. Although PARP9 and PARP14 massively increase expression upon IFN treatment in A549 cells, both proteins have been detected by Western Blot in A549 cells previously at baseline.

      Nonetheless, we have included modeling of more diverse macrodomains as a supplemental figure and added to the text:

      Modeling of other diverse macrodomains, including those within human PARP9 and PARP14 further suggests that AVI-4206 is selective for Mac1 (Supplementary Figure 4)

      (b) In the context of SARS-CoV-2 superinfection are a known major complication of infections. These superinfections are associated with lung damage and therefore it would be good if the authors could assess lung damage, e.g. by histology, to see if their treatment has a positive impact on lung damage and thus may help to suppress complications.

      We performed histology and the results are inconclusive, but suggest that AVI-4206 treatment could lower apoptosis.There is no difference in pathology between the N40D cohort and vehicle with these markers. This could suggest that AVI-4206 provides an additional mechanism that results in protection.  We added to the results:

      Caspase 3 staining shows that AVI-4206 treatment reduces apoptosis in the lungs compared to vehicle controls. Additionally, Masson's Trichrome staining reveals  a significant reduction in collagen deposition, a surrogate for lung pathology, in the lungs of AVI-4206 treated animals.(Supplementary Figure 9).

      Histology:

      Mouse lung tissues were fixed in 4% PFA (Sigma Aldrich, Cat #47608) for 24 hours, washed three times with PBS and stored in 70% ethanol. All the stainings were performed at Histo-Tec Laboratory (Hayward, CA). Samples were processed, embedded in paraffin, and sectioned at 4μm. The slides were dewaxed using xylene and alcohol-based dewaxing solutions. Epitope retrieval was performed by heat-induced epitope retrieval (HIER) of the formalin-fixed, paraffin-embedded tissue using citrate-based pH 6 solution (Leica Microsystems, AR9961) for 20 mins at 95°C. The tissues were stained for H&E, caspase-3 (Biocare #CP229c 1:100), and trichrome, dried, coverslipped (TissueTek-Prisma Coverslipper), and visualized using Axioscan 7 slide scanner (ZEISS) at 40X. Image quantification was performed with Image J software and GraphPad Prism.

      (c) Fig. 1D labelling is wrong

      Thank you - fortunately the data were plotted correctly and it was just the inset table of values that was incorrect. This is now fixed!

      (d) Line 88: "T" missing at start of sentence

      Fixed, thank you!

      (e) Line 118: NudT5/AMP-Glo assay was developed in https://doi.org/10.1021/acs.orglett.8b01742

      We have added this foundational reference, thank you!

      (f) Line 147ff: It would be good if the authors could highlight that the TPP methodology has known limitations (e.g. detection of low abundance proteins and low thermal shift of some binders) and thus is not an absolute proof that AVI-4206 "engage with high specificity for Mac1"

      We added this important context to the concluding sentence of this paragraph:

      “While this assay may not be sensitive to detection of proteins with low abundance proteins or low thermal shift upon ligand binding, collectively, these results indicate that AVI-4206 can cross cellular membranes and engage with high specificity for Mac1.”

      (g) The authors use their well established in vitro Mac1 model as well as the SARS-CoV-2 WA strain. Given the ongoing diversification of SARS-CoV-2 and the current prevalence of the Omicron VOC it would be good if the authors could investigate whether alteration in Mac1 occurred or are detected which could influence the efficacy of their inhibitor. Similarly, it would be interesting to know how effective their drug is on other clinically relevant beta-CoV Mac1, e.g. from MERS or SARS1.

      We thank the reviewer for the suggestion. Mac1 is one of the more conserved areas of the SARS-CoV-2 genome as there has only been one nonsynonymous mutation V34L (Orf1a:V1056L) that recently emerged in the BA.2.86 lineage and is now in all of the JN.1 derivatives. Currently, the mutation is only ~80% penetrant in circulating SARS-CoV-2 sequences suggesting that it might revert to wild-type and is not associated with a fitness benefit. Based on our structural analysis (shown in Supplementary Figure4D above), we do not believe this mutation affects AVI-4206 binding, but we are including this variant in our future in vitro and in vivo studies as well as other beta-CoV.  For SARS and MERS, see response to Reviewer 1 using CETSA to show that these targets are engaged by AVI-4206.

      (h) As methods to detect PARP14-derived ADP-ribosylation are available and it was shown that Mac1 can reverse this modification in cells. It would be good if the authors could investigate the impact of AVI-4206 on ADP-ribosylation in vivo.

      To test this idea we adapted the IF assay used by others in the field and show an effect of AVI-4206. We have added to the text:

      Although the IFN response was not sufficient to control viral replication, it is possible that the changes in ADP-ribosylation, in particular marks catalyzed by PARP14, downstream of IFN treatment could serve as a marker for Mac1 efficacy  (Ribeiro et al. 2025). To investigate whether downstream signals from PARP14 were specifically erased by Mac1, we used an immunofluorescence assay that showed that Mac1 could remove IFN-γ-induced ADP-ribosylation that is mediated by PARP14 (Kar et al. 2024).  We stably expressed wild-type Mac1 and the N40D mutant Mac1 in A549 cells. The data showed that Mac1 expression decreased IFN-γ-induced ADP-ribosylation, whereas the Mac1-N40D mutant did not (Figure 3E, F), indicating that Mac1 mediates the hydrolysis of IFN-γ-induced ADP-ribosylation. The PARP14 inhibitor RBN012759 completely blocked IFN-γ-induced ADP-ribosylation (Figure 3E, F), further confirming that IFN-γ-induced ADP-ribosylation is mediated by PARP14. AVI-4206 reversed the Mac1-induced hydrolysis of ADP-ribosylation and enhanced the ADP-ribosylation signal in Mac1-overexpressing cells (Figure 3E, F), further demonstrating its ability to inhibit the hydrolase activity of Mac1. We further validated this result using different ADP-ribosylation antibodies for immunofluorescence (Supplementary Figure 7). However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8). Collectively, these results provide further evidence that simple cellular models are insufficient to explore the effects of Mac1 inhibition and that monitoring specific PARP14-mediated ADP-ribosylation patterns can provide an accessible biomarker for the efficacy of Mac1 inhibition.

      A549 Mac1 expression cell construction

      Mac1 wild-type (Mac1) and N1062D mutant (Mac1 N1062D) gene fragments were loaded into pLVX-EF1α-IRES-Puro (empty vector, EV) using Gibson cloning kit (NEB E5510). Lentivirus was prepared as previously described (PMID: 30449619; DOI: 10.1016/j.cell.2018.10.024). Briefly, 15 million HEK293T cells were grown overnight on 15 cm poly-L-Lysine coated dishes and then transfected with 6 ug pMD2.G (Addgene plasmid # 12259 ; http://n2t.net/addgene:12259 ; RRID:Addgene_12259), 18 ug dR8.91 (since replaced by second generation compatible pCMV-dR8.2, Addgene plasmid #8455) and 24 ug pLVX-EF1α-IRES-Puro (EV, Mac1, Mac1-N1062D) plasmids using the lipofectamine 3000 transfection reagent per the manufacturer’s protocol (Thermo Fisher Scientific, Cat #L3000001). pMD2.G and dR8.91 were a gift from Didier Trono. The following day, media was refreshed with the addition of viral boost reagent at 500x as per the manufacturer’s protocol (Alstem, Cat #VB100). Viral supernatant was collected 48 hours post transfection and spun down at 300 g for 10 minutes, to remove cell debris. To concentrate the lentiviral particles, Alstem precipitation solution (Alstem, Cat #VC100) was added, mixed, and refrigerated at 4°C overnight. The virus was then concentrated by centrifugation at 1500 g for 30 minutes, at 4°C. Finally, each lentiviral pellet was resuspended at 100x of original volume in cold DMEM+10%FBS+1% penicillin-streptomycin and stored until use at -80°C. To generate Mac1 overexpressing cells, 2 million A549 cells were seeded in 10 cm dishes and transduced with lentivirus in the presence of 8 μg/mL polybrene (Sigma, TR-1003-G). The media was changed after 24h and, after 48 hours, media containing 2μg/ml puromycin was added. Cells were selected for 72 hours and then expanded without selection. The expression of Mac1 was confirmed by Western Blot.

      Immunofluorescence assay:

      To assess the effect of Mac1 on IFN-induced ADP-ribosylation. A549-pLVX-EV, A549-pLVX-Mac1 and A549-pLVX-Mac1-N1062D cells were seeded in 96-well plate (10,000 cells/well). Cells were pre-treated with medium or 100 unit/mL IFN-γ (Sigma, SRP3058) for 24 hours to induce the expression of ADP-ribosylation. These 3 cell lines were then treated the next day with the indicated concentrations of AVI-4206 or RBN012759 (Medchemexpress, HY-136979). After 24 hours of exposure to drugs, treated cells were fixed in pre-cooled methanol at -20°C for 20 min, blocked in 3% bovine serum albumin for 15 min, incubated with Poly/Mono-ADP Ribose (E6F6A) Rabbit mAb (CST, 83732S) or Poly/Mono-ADP Ribose (D9P7Z) Rabbit mAb (CST, 89190S) antibodies for 1 h, and then incubated with Goat anti-Rabbit IgG Secondary Antibody, Alexa Fluor 488 (ThermoFisher, A-11008) secondary antibodies for 30 min and stained with DAPI for 10 minutes. Fluorescent cells were imaged with an IN Cell Analyzer 6500 System (Cytiva) and analyzed using IN Carta software (Cytiva).

      Reviewer #3 (Recommendations for the authors):

      Just a couple of observations/details that might help strengthen the article:

      (1) The caco-1 data for AVI4206 would suggest that there is some sort of efflux going on, yet there is no mention of it in the paper. This might be useful in the optimization paradigm moving forward.

      We thank the reviewer for this observation and suggestion.  Indeed, we believe that efflux is behind the low oral bioavailability of AVI-4206.  We are working specifically to remove this liability in next-generation analogs, using the caco2 assay to guide this ongoing effort. Keep an eye out for a preprint on this soon!  We have added to the discussion:

      “In addition to dissecting such molecular mechanisms of macrodomain function and inhibition, future efforts will focus on improving pharmacokinetic properties, including a cellular efflux liability that results in low oral bioavailability of AVI-4206. ”

      (2) There are some spectroscopic anomalies/mistakes in the NMR data. The carbon NMR for 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one should only have 14 unique carbons, but the authors report 15. The HNMR for AVI1500 should only have 19 H's, but the authors list 20. The HNMR data for AVI3762/3763 should have 16 H's, but the authors only report 13. The CNMR for AVI4206 should only have 19 unique carbons, but the authors report 20.

      Thank you for noting these inconsistencies regarding the reported NMR spectra. We have rectified them by more closely examining the spectra and in some cases acquiring new data. We identified one peak (47.9) in the 13C NMR of 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one that is apparently an artifact of the automated peak picking in the data analysis software.  In the 1H NMR of AVI-1500, the triplet peak at 7.20 integrates to 1H, but was erroneously reported as 2H in the original manuscript.  This error has been corrected.  Spectra were re-acquired for AVI-3762, AVI-3763, and AVI-4206 with longer acquisition times, and/or on a 600 MHz spectrometer to afford the complete line lists now reported in the revised manuscript. Please note AVI-4206 has 18 distinct 13C resonances due to the equivalence of the gem-dimethyl methyl groups.

    1. eLife Assessment

      This study reanalyzed previously published scRNA-seq and TCR-seq data to examine the proportion and characteristics of dual-TCR-expressing Treg cells in mice, presenting some useful insights into TCR diversity and immune regulation. However, the evidence is incomplete, particularly with respect to data interpretation, statistical rigor, and the functionality of dual -TCR Treg cells. The study is potentially of interest to immunologists studying T-cell biology.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript, by Xu and Peng, et al. investigates whether co-expression of 2 T cell receptor (TCR) clonotypes can be detected in FoxP3+ regulatory CD4+ T cells (Tregs) and if it is associated with identifiable phenotypic effects. This paper presents data reanalyzing publicly available single-cell TCR sequencing and transcriptional analysis, convincingly demonstrating that dual TCR co-expression can be detected in Tregs, both in peripheral circulation as well as among Tregs in tissues. They then compare metrics of TCR diversity between single-TCR and dual TCR Tregs, as well as between Tregs in different anatomic compartments, finding the TCR repertoires to be generally similar though with dual TCR Tregs exhibiting a less diverse repertoire and some moderate differences in clonal expansion in different anatomic compartments. Finally, they examine the transcriptional profile of dual TCR Tregs in these datasets, finding some potential differences in expression of key Treg genes such as Foxp3, CTLA4, Foxo3, Foxo1, CD27, IL2RA, and Ikzf2 associated with dual TCR-expressing Tregs, which the authors postulate implies a potential functional benefit for dual TCR expression in Tregs.

      Strengths:

      This report examines an interesting and potentially biologically significant question, given recent demonstrations that dual TCR co-expression is a much more common phenomenon than previously appreciated (approximately 15-20% of T cells) and that dual TCR co-expression has been associated with significant effects on the thymic development and antigenic reactivity of T cells. This investigation leverages large existing datasets of single-cell TCRseq/RNAseq to address dual TCR expression in Tregs. The identification and characterization of dual TCR Tregs is rigorously demonstrated and presented, providing convincing new evidence of their existence.

      Weaknesses:

      The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans, limiting the novelty of the reported findings. The presented results should be considered in the context of these prior important findings. The focus on self-citation of their previous work, using the same approach to measure dual TCR expression in other datasets. limits the discussion of other more relevant and impactful published research in this area. Also, Reference #7 continues to list incorrect authors. The authors do not present a balanced or representative description of the available knowledge about either dual TCR expression by T cells or TCR repertoires of Tregs.

      The approach used follows a template used previously by this group for re-analysis of existing datasets generated by other research groups. The descriptions and interpretations of the data as presented are still shallow, lacking innovative or thoughtful approaches that would potentially be innovation or provide new insight.

      This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells. The response to this criticism in a previous review is considered non-responsive and does not improve the data or findings.

      Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. The interpretations of the gene expression analyses are somewhat simplistic, focusing on single-gene expression of some genes known to have function in Tregs. However, the investigators continue to miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291). No attempt to define clusters is made. No comparison is made of the proportions of dual TCR cells in transcriptionally-defined clusters. The broad assessment of key genes by single- and dual TCR cells is conceptually interesting, but likely to be confounded by the heterogeneity of the Treg populations. This would need to be addressed and considered to make any analyses meaningful.

      The study design, re-analysis of existing datasets generated by other scientific groups, precludes confirmation of any findings by orthogonal analyses.

    3. Reviewer #3 (Public review):

      Summary:

      This study addressed the TCR pairing types and CDR3 characteristics of Treg cells. By analyzing scRNA and TCR-seq data, it claims that 10-20% of dual TCR Treg cells exist in mouse lymphoid and non-lymphoid tissues and suggests that dual TCR Treg cells in different tissues may play complex biological functions.

      Strengths:

      The study addresses an interesting question of how dual-TCR-expressing Treg cells play roles in tissues.

      Weaknesses:

      This study is inadequate, particularly regarding data interpretation, statistical rigor, and the discussion of the functional significance of Dual TCR Tregs.

      Comments on revisions:

      Although the authors have provided brief explanations in response to the reviewers' comments, they do not present any additional analyses that would address the fundamental concerns in a convincing manner.

      Moreover, the in silico analyses presented in the manuscript alone are insufficient to support the conclusions, and the functional experiments requested by the reviewers have not been conducted.

      In the current rebuttal, while some textual additions have been made to the manuscript, the only substantial revision to the figures appears to be the inclusion of statistical significance annotations (e.g., Fig. 1G, Fig. 3G). These changes do not adequately strengthen the overall data or address the core issues raised.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual  TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1) The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions. 

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. eLife Assessment

      This study proposes an important new approach to analyzing cell-count data, which are often undersampled and cannot be accurately assessed using traditional statistical methods. The case studies presented in the article provide compelling evidence of the superiority of the proposed methodology over existing approaches, which could promote the use of Bayesian statistics among neuroscientists. The authors have taken steps to make the methodology accessible, although some implementation difficulties are likely to remain.

    2. Reviewer #1 (Public review):

      Summary:

      This work proposes a new approach to analyse cell-count data from multiple brain regions. Collecting such data can be expensive and time-intensive, so, more often than not, the dimensionality of the data is larger than the number of samples. The authors argue that Bayesian methods are much better suited to correctly analyse such data compared to classical (frequentist) statistical methods. They define a hierarchical structure, partial pooling, in which each observation contributes to the population estimate to more accurately explain the variance in the data. They present two case studies in which their method proves more sensitive in identifying regions where there are significant differences between conditions, which otherwise would be hidden.

      Strengths:

      The model is presented clearly, and the advantages of the hierarchical structure are strongly justified. Two alternative ways are presented to account for the presence of zero counts. The first involves the use of a horseshoe prior, which is the more flexible option, while the second involves a modified Poisson likelihood, which is better suited to datasets with a large number of zero counts, perhaps due to experimental artifacts. The results show a clear advantage of the Bayesian method for both case studies.<br /> The code is freely available, and it does not require a high-performance cluster to execute for smaller datasets. As Bayesian statistical methods become more accessible in various scientific fields, the whole scientific community will benefit from the transition away from p-values. Hierarchical Bayesian models are an especially useful tool that can be applied to many different experimental designs. However, while conceptually intuitive, their implementation can be difficult. The authors provide a good framework with room for improvement.

      Weaknesses:

      As with any Bayesian model, the choice of prior can significantly influence the results. The authors explain how the methodology can be adapted to different data properties, though selecting an appropriate prior or likelihood may not always be straightforward. They propose a 'standard workflow' as an alternative to traditional approaches, which could and should be used alongside established methods while Bayesian techniques continue to evolve and improve.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.”

      We hope in this paper to propose a ‘standard workflow’ for these data; this standard workflow uses the horseshoe prior and we propose that this is the approach used to describe cell count data instead of the better established, but to our thinking, inefficient, t-testing approach.

      The horseshoe prior is robust and allows a partially-pooled model to used while weighing-up the contribution of different data points. This is an analogue of excluding outliers and, in any analysis it is normal to investigate further if there are points being excluded as outliers. Often this reveals a particular challenge with the data, in the case of the data here, there are a lot of zeros, indicating that some samples should be excluded because the preparation failed to tag cells rather than because there were no cells to tag. This idea behind the ZIP example is to show that the Bayesian method can allow for this sort of further investigation and, indeed, as the reviewer notes this sort of extended analysis is often bespoke, tailored to the data.

      We have clearly failed to explain that the ‘standard workflow’ we propose replace the more traditional methods is the first one we describe, with the horseshoe prior; this produces better results on both datasets than the traditional approach. However, we also feel it is useful to show how a more tailored follow-on can be useful; we need to make it clear that this is intended as an illustration of an ‘optional extra’ rather than a part of the more straightforward ‘standard workflow’.

      To make this clearer we have made altered the text in several locations:

      • end of Introduction: added clarifying sentence “Here, our aim is to introduce a ‘standard’ Bayesian model for cell count data. We illustrate the application of this model to two datasets, one related to neural activation and the other to developmental lineage. For the second dataset, we also demonstrate a second example extension Bayesian model.”

      • Section Hierarchical modeling: “Our goal in both cases is to quantify group differences in the data. We present a ‘standard’ hierarchical model. This model reflects the experimental features common to cell count experiments and reflects the hierarchical structure of cell count data; the standard model is designed to deal robustly and efficiently with noise. On some occasions, to reflect a specific hypotheses, the structure of a particular experiment or an observed source of noise, this model can be further refined or changed to target the analysis. We will give an example of this for our second dataset.”

      • Section Horseshoe prior: “The alternative is via a flexible prior such as the horseshoe Carvalho et al., 2010; Piironen and Vehtari, 2017. This more generic option may be suitable as a default ‘standard’ approach in the typical case where outliers are poorly understood.”

      • Discussion: word ‘standard’ added to sentence: “Our standard workflow uses a horseshoe prior, along with the partial pooling, this allows our model to deal effectively with outliers.”

      • Discussion: modified sentence “The horseshoe prior model workflow we have exhibited here is intended as a standard approach.”

      Indeed, because the horseshoe prior deals robustly with outliers, whereas the ZIP is intended to model the outliers, any substantial difference between the two should be examined carefully. The referee is right to point out that we have not explained this in any detail and has helpfully listed a few brain regions were there are differences. This is useful, particularly since the examples listed illustrate in a useful way the opportunities and hazards this sort of data presents. To address this, we have added a new version of Figure 6 to the revised manuscript

      Previously Figure 6 showed two example brain regions: MPN and TMd. We have now added MH and SCH to the figure, and new text commenting on the insights the plots provide, both in the Results and Discussion.

      Reviewer #2 (Public review):

      “A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.”

      This is a very good point and we are acutely aware through our own work how difficult it can be moving between fields with different research goals, different scientific cultures and different technical vocabularies. Just as it can be difficult translating from one language to another without losing nuance and meaning, it can be a real challenge finding technical terms that are useful for the non-expert reader while retaining the precision the application requires! In the long run, we hope that, just as some of the very specialized vocabulary that surrounds frequentist statistics has become familiar to to the working experimental scientists, the precise terminology involved in Bayesian modelling will become familiar and transparent. However, in advance of that day, we have included a glossary of terms at the end of the main text, and have made numerous small tweaks to make sure that link between data and model terminology is clearer and better explained.

      Reviewer #1 (Recommendations fro the authors):

      (1) “I would strongly recommend that the authors include more case studies in the manuscript, and address the qualitative differences between the different versions of the model.”

      We agree that our method will only become established when it is applied to more datasets, we hope to contribute to further analysis and we know other people are already using the approach on their own data. We do, however, feel that adding more datasets to this paper will make it longer and more complex; the plan, instead, is to use the method on novel datasets to test specific hypotheses, so that the results will include novel scientific findings as well as adding another illustration of the Bayesian approach applied to data that is already well studied.

      (2) “Figure 6 is not discussed in the main text.”

      We had discussed the results presented in Figure 6 in the second paragraph of the section “Case study two – Ontogeny of inhibitory interneurons of the mouse thalamus”, however the reviewer is right in that we did not directly refer to the Figure – this was an oversight. In any case, in the revised manuscript we present a new version of Figure 6 (in response to above comment), which is now explicitly cited in the text.

      Revised Figure 6: Example data and inferences highlighting model discrepancies. On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: HDIs and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (MPN, SCH, TMd) or large-valued outliers (MH) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

      Reviewer #2 (Recommendations from the authors):

      (1) “This is a generally well-written methodology paper that also provides the underlying code as a resource. As a reviewer outside both cell-count modelling and hierarchical-Bayesian approaches (though with a general interest in the topics) I found the method a little difficult to follow and would have liked to have been left with a better understanding of how the method is applied to the data. For example, in Figure 1 we are introduced to brain region count, animal count, and “items”. Then in the next line: pooling, model, structure, population and etc in subsequent lines. It is not clear what the subscripts (the pools?) are referring to: are they different regions R or animals N? These terms need to be better linked to the data and/or trimmed. Having said that, the later results look like a solid contribution to the field with a significant reduction in uncertainty from the Bayesian approach over the frequentist one. A future version of the manuscript, therefore, would benefit from greater precision of language as well as an economy and greater focus of terms linking the method to the biology. This is particularly the case around the exposition parts in Figure 1, Figure 2, and the “Hierarchical modelling” section.”

      This is another important point. We have now made numerous small changes to tighten up the text in the paper, in response to both this point and the next point.

      (2) “Language throughout could be sharpened. Subjectivity like “surprising outliers” could be removed and quirky grammar like “often small, ten is a typical” improved. There are also typos “an rate” etc that should be tidied up.”

      As per previous response, we have made numerous tweaks and small improvements and feel that the paper is stronger in this respect.

      (3) “Figure 1 caption. “It is a spectrum that depends” Is spectrum the right word here? Also, “thicker stroke” what does this refer to? Wasn’t immediately clear. In A, why is the whole animal within the R bracket that signifies brain regions, and then the brain regions are within the N bracket that signifies whole animals? Apart from the teal colouring, what are the other coloured regions in the image referring to? Improving this first figure would greatly help a reader unfamiliar with the context of the approach.”

      We have replaced the word “spectrum” with “continuum”. We have replaced “ Observed quantities have been highlighted with a thicker stroke in the graphical model.” with “The observed data quantities, y<sub>i</sub> to y<sub>n</sub>, are highlighted with a thick line in the model diagrams”. We have added the following text to describe the red and green lines in panel A: “green and red lines indicate regions labeled as damaged”.

      (4) “On P2 there is no discussion of priors when running through the advantage of the Bayesian approach. Is this a choice or an oversight? Priors do have a role in the later analysis.”

      A short additional paragraph has been added to the introduction outlining the advantage of having a prior, but also noting that the obligation to pick a prior can be intimidating and that suggesting priors is one of the contributions of our paper: “A Bayesian model also includes a set of probability distributions, referred to as the prior, which represent those beliefs it is reasonable to hold about the statistical model parameters before actually doing the experiment. The prior can be thought of as an advantage, it allows us to include in our analysis our understanding of the data based on previous experiments. The prior also makes explicit in a Bayesian model assumptions that are often implicit in other approaches. However, having to design priors is often considered a challenge and here we hope to make this more straightforward by suggesting priors that are suitable for this class of data.”

      (5) “On P4 more explanation would help greatly. Formulas like 23*10*4 or 50*6+50*4 are presented without explanation. What are the various numbers being multiplied? Regions, animals? Again, a clearer link between biological data and model structure would be advantageous.”

      We have now modified this line to clearly state the numbers’ sources: “The index i runs over the full set of samples, which in this case comprises 23 brain regions ×10 animals ×4 groups ≈920 datapoints in the first study, and 50 brain regions × 6 HET animals + 50 brain regions × 4 KO animals ≈500 datapoints in the second.”

      (6) “P6 and Results. Is it possible to show examples of the data set sampled from? Perhaps an image or two for the two experiments. Both Figures 4 and 5 as they currently are could be made slightly smaller to provide space for a small explanatory sub-panel. This would help ground the results.”

      This is a good idea. We have now added heatmap visualisations of both entire datasets to revised versions of Figures 4 and 5 (assuming that this is what the reviewer was suggesting).

    1. eLife Assessment

      Using single-cell transcriptomic data from adult mouse inner ear hair cells, the authors identify the differences and similarities of the four hair cell types. They make an important finding: that vestibular hair cells can express many ciliary motility-related genes. Some hair cell kinocilia display motility, suggesting that the kinocilium of vestibular hair cells may function as an active force generator to increase sensitivity. The evidence is incomplete as to whether all kinocilia beat and what the function of kinocilia movement is.

    2. Reviewer #1 (Public review):

      Summary

      Xu et al. use transcriptomic comparisons of mouse cochlear and vestibular hair to show that the vestibular hair cells alone are enriched in gene expression for proteins necessary for cilia motility and to further argue that such motility is a normal function of the kinocilia.

      Background:

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons, and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it, the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. noted that some ciliary genes related to motility are expressed by vestibular but not cochlear hair cells. They curated the ciliary genes into types known to be associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. They add immunostaining to back up some of the RNA data, and also evaluate relative expression by neonatal mouse cochlear and vestibular hair cells from a published dataset. The focus on kinociliary genes is an appropriate use of the comparative expression data for cochlear and vestibular hair cells, and the paper overall is readable and interesting. The transcriptome data are rounded off by comparing the authors' results in adult hair cells with published neonatal mouse cochlear and vestibular transcriptomes.

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

    4. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A).  We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).  

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we will moderate our interpretation accordingly in the revision.

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      We aimed to show that kinocilia in neonatal cochlear and vestibular hair cells are largely similar, except that neonatal cochlear hair cells lack key genes and proteins required for the motile apparatus. While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we will mark such cytoplasmic or multifunctional genes with stars in both Figure 5G and Figure 6D together with legend in the revised manuscript.

      Although those genes (i.e., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) are highly expressed in neonatal cochlear hair cells, key genes for motile machinery are not detected. For example, Dnah6, Dnah5, and Wdr66 are not expressed in the P2 cochlear hair cells.  Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms while Wdr66 is a component of radial spokes. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells.  Axonemal CCDC39 and CCDC40 are the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome and are required for the assembly of IDAs and N-DRC for ciliary motility (Becker-Heck et al., 2011; Merveille et al., 2011; Oda et al., 2014). We will modify Figure 6D to highlight the key difference between P2 cochlear and vestibular hair cells in the revised manuscript. We will also revise the text so that the key differences will clearly be described.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We will revise the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates. 

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We will revise the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations:

      We will make changes in the revision based on the joint recommendations of the two reviewers.

      References

      Becker-Heck, A., Zohn, I.E., Okabe, N., Pollock, A., Lenhart, K.B., Sullivan-Brown, J., McSheene, J., Loges, N.T., Olbrich, H., Haeffner, K., Fliegauf, M., Horvath, J., Reinhardt, R., Nielsen, K.G., Marthin, J.K., Baktai, G., Anderson, K.V., Geisler, R., Niswander, L., Omran, H., Burdine, R.D., 2011. The coiled-coil domain containing protein CCDC40 is essential for motile cilia function and left-right axis formation. Nat Genet 43, 79–84. https://doi.org/10.1038/ng.727

      Benser, M.E., Marquis, R.E., Hudspeth, A.J., 1996. Rapid, Active Hair Bundle Movements in Hair Cells from the Bullfrog’s Sacculus. J. Neurosci. 16, 5629–5643. https://doi.org/10.1523/JNEUROSCI.16-18-05629.1996

      Fawcett, D.W., Ito, S., 1965. The fine structure of bat spermatozoa. American Journal of Anatomy 116, 567–609. https://doi.org/10.1002/aja.1001160306

      Flock, Å., Flock, B., Murray, E., 1977. Studies on the Sensory Hairs of Receptor Cells in the Inner Ear. Acta Oto-Laryngologica 83, 85–91. https://doi.org/10.3109/00016487709128817

      Kikuchi, T., Takasaka, T., Tonosaki, A., Watanabe, H., 1989. Fine structure of guinea pig vestibular kinocilium. Acta Otolaryngol 108, 26–30.https://doi.org/10.3109/00016488909107388

      Lechtreck, K.-F., Gould, T.J., Witman, G.B., 2013. Flagellar central pair assembly in Chlamydomonas reinhardtii. Cilia 2, 15. https://doi.org/10.1186/2046-2530-2-15

      Martin, P., Bozovic, D., Choe, Y., Hudspeth, A.J., 2003. Spontaneous Oscillation by Hair Bundles of the Bullfrog’s Sacculus. J. Neurosci. 23, 4533–4548. https://doi.org/10.1523/JNEUROSCI.23-11-04533.2003

      Merveille, A.-C., Davis, E.E., Becker-Heck, A., Legendre, M., Amirav, I., Bataille, G., Belmont, J., Beydon, N., Billen, F., Clément, A., Clercx, C., Coste, A., Crosbie, R., de Blic, J., Deleuze, S., Duquesnoy, P., Escalier, D., Escudier, E., Fliegauf, M., Horvath, J., Hill, K., Jorissen, M., Just, J., Kispert, A., Lathrop, M., Loges, N.T., Marthin, J.K., Momozawa, Y., Montantin, G., Nielsen, K.G., Olbrich, H., Papon, J.-F., Rayet, I., Roger, G., Schmidts, M., Tenreiro, H., Towbin, J.A., Zelenika, D., Zentgraf, H., Georges, M., Lequarré, A.-S., Katsanis, N., Omran, H., Amselem, S., 2011. CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43, 72–78. https://doi.org/10.1038/ng.726

      Moon, K.-H., Ma, J.-H., Min, H., Koo, H., Kim, H., Ko, H.W., Bok, J., 2020. Dysregulation of sonic hedgehog signaling causes hearing loss in ciliopathy mouse models. eLife 9, e56551. https://doi.org/10.7554/eLife.56551

      Oda, T., Yanagisawa, H., Kamiya, R., Kikkawa, M., 2014. A molecular ruler determines the repeat length in eukaryotic cilia and flagella. Science 346, 857–860. https://doi.org/10.1126/science.1260214

      O’Donnell, J., Zheng, J., 2022. Vestibular Hair Cells Require CAMSAP3, a Microtubule Minus-End Regulator, for Formation of Normal Kinocilia. Front Cell Neurosci 16, 876805. https://doi.org/10.3389/fncel.2022.876805

      Pfister, K.K., Shah, P.R., Hummerich, H., Russ, A., Cotton, J., Annuar, A.A., King, S.M., Fisher, E.M.C., 2006. Genetic Analysis of the Cytoplasmic Dynein Subunit Families. PLOS Genetics 2, e1. https://doi.org/10.1371/journal.pgen.0020001

      Polino, A.J., Sviben, S., Melena, I., Piston, D.W., Hughes, J.W., 2023. Scanning electron microscopy of human islet cilia. Proceedings of the National Academy of Sciences 120, e2302624120. https://doi.org/10.1073/pnas.2302624120

      Rüsch, A., Thurm, U., 1990. Spontaneous and electrically induced movements of ampullary kinocilia and stereovilli. Hearing Research 48, 247–263. https://doi.org/10.1016/0378-5955(90)90065-W

      Shi, H., Wang, H., Zhang, C., Lu, Y., Yao, J., Chen, Z., Xing, G., Wei, Q., Cao, X., 2022. Mutations in OSBPL2 cause hearing loss associated with primary cilia defects via sonic hedgehog signaling [WWW Document]. https://doi.org/10.1172/jci.insight.149626

      Stooke-Vaughan, G.A., Huang, P., Hammond, K.L., Schier, A.F., Whitfield, T.T., 2012. The role of hair cells, cilia and ciliary motility in otolith formation in the zebrafish otic vesicle. Development 139, 1777–1787. https://doi.org/10.1242/dev.079947

      Wu, D., Freund, J.B., Fraser, S.E., Vermot, J., 2011. Mechanistic Basis of Otolith Formation during Teleost Inner Ear Development. Developmental Cell 20, 271–278. https://doi.org/10.1016/j.devcel.2010.12.00

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<SUP>+</SUP>/K<SUP>+</SUP>ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channelsand extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2)The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      Comments on revisions:

      The revised manuscript is notably improved.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the Na<SUP>+</SUP>/K<SUP>+</SUP> pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for each proposed compensatory mechanism. The Na<SUP>+</SUP>/K<SUP>+</SUP> pump is however not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide rough estimates on the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these rough estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost estimate on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      I just have a few recommendations on the updated manuscript.

      (1) When exploring the different roles of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase in the Results section, the authors employed many different models. For instance, the voltage equation on page 15, voltage equation (2) on page 22, voltage equation (12) on page 24, voltage equation (30) on page 32, and voltage equation (38) on page 35 are presented as the master equations for their respective biophysical models. Meanwhile, the phase models are presented on page 29 and page 33. I would recommend that the authors clearly specify which equations correspond to each subsection of the Results section and explicitly state which equations were used to generate the data in each figure. This would help readers more easily follow the connections between the models, the results, and the figures.

      We thank the reviewer for pointing out that the links of the different voltage equations to the results could be expressed more explicitly in the article. All simulations were done using the ‘master equation’  expressed in Eq. 2, and the other voltage equations that are specified in the article (in the new version of the article Eqs. 13, 31, and 39) are reformulations of Eq. 2 to analytically show different properties of the voltage equation (Eq. 2). This has now been mentioned in the article when formulating the voltage equations, and the equation for the total leak current (in the new version Eq. 3) has been added for completeness.

      (2) The authors may want to revisit their description and references concerning Eigenmannia virescens. For example, wave-type weakly electric fish (e.g., Eigenmannia) and pulse-type weakly electric fish (e.g., Gymnotus carapo) exhibit large differences, making references 52-55 may be inappropriate for subsection 4.3.1, as these studies focus on Gymnotus carapo. Additionally, even within wave-type species, chirp patterns vary. For example, Eigenmannia can exhibit short "pauses"-type chirps, whereas Apteronotus leptorhynchus (another waver-form fish) does not (https://pubmed.ncbi.nlm.nih.gov/14692494/).

      We thank the reviewer for pointing this out. The citations and phrasing in sections 4.3.1 and 4.3.2 have been updated to specifically refer to the weakly electric fish e. Virescens.

      (3) Table on page 21: Please explain why the parameter value (13.5mM) of [Na<SUP>^</SUP>+]_{in} is 10 timeslarger than its value (1.35mM) in reference [26]? How does this value (13.5mM) compare with the range of variable [Na<SUP>^</SUP>+]_{in} in equation (6)?

      The intracellular sodium concentration in reference [26] was reported to be 1.35 mM, but the authors also reported an extracellular sodium concentration of 120 mM, and a sodium reversal potential of 55 mV. Upon calculating the sodium reversal potential, we found that an intracellular sodium concentration of 1.35 mM would give a sodium reversal potential of 113 mV. An intracellular sodium concentration of 13.5 mM, on the other hand, leads to the reported and physiological reversal potential of 55 mV. This has now been clarified in the article, and the connection between this value and Eq. 6 (Eq. 7 in the new version) has also been clarified.

      Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

      Weaknesses:

      The model for action potential generation simplifies ion dynamics by considering only sodium and potassium currents, excluding other ions like calcium. The ion channels considered are assumed to be static, without any dynamic regulation such as post-translational modifications. For instance, a sodium-dependent potassium pump could modulate potassium leak and spike amplitude (Markham et al., 2013).

      This work considers only the sodium-potassium (NaK) pumps to restore ion gradients. However, in many cells, several other ion pumps, exchangers, and symporters are simultaneously present and actively participate in restoring ion gradients. When sodium currents dominate action potentials, and thus when NaK pumps play a critical role, such as the case in Eigenmannia virescens, the present study is valid. However, since other biological processes may find different solutions to address the pump's non-electroneutral nature, the generalizability of the results in this work to other fast-spiking cell types is limited. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      We thank the reviewer for the detailed summary and the updated identified strengths and weaknesses. The current article indeed focuses on and isolates the interplay between sodium currents, potassium currents, and sodium-potassium pump currents. As discussed in section 5.1, in excitable cells where these currents are the main players in action-potential generation, the results presented in this article are applicable. The contribution of post-translational effects of ion channels, other ionic currents, and other active transporters and pumps, could be exciting avenues for further studies

      .

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments.

      All the figures are now consistent. The color schema used is clear.

      The methods and discussions expansions improve the paper.

      Including the model assumptions and simplifications is appreciated.

      Including internal references is helpful.

      The equations are clear, and the references have been fixed.

      I am content with the changes. I have updated my review accordingly.

      We thank the reviewer for their initial constructive comments that lead to the significant improvement of the article.

      Page : 3 Line : 113 Author : Unknown Author 07/24/2025 

      Although this is technically correct, the article is about electrocommunication signals and does not focus on sensing.

      Page : 3 Line : 153 Author : Unknown Author 07/24/2025

      electrocommunication

      Page : 4 Line : 164 Author : Unknown Author 07/24/2025 

      Judging from the cited article, I think this should be a sodium-dependent potassium current.

    2. Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

    3. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na+/K+-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na+/K+-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na+/K+-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Comments on revisions:proposes

      The revised manuscript is notably improved.

    4. eLife Assessment

      This important study provides new insights into the lesser-known effects of the sodium-potassium pump on how nerve cells process signals, particularly in highly active cells like those of weakly electric fish. The computational methods used to establish the claims in this work are compelling and can be used as a starting point for further studies.

    1. eLife Assessment

      This important study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing a significant challenge in understanding small-molecule:IDP interactions. The findings have solid support through examples underscoring the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs at varying depths. The method builds on the authors' previous work and, with ad hoc modifications, is poised to benefit this emerging field.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow.

    4. Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      Comments on revised version:

      I'm satisfied with the authors' response and the public review does not need further changes.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work – which is the discovery of “satellite genomes” which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewer’s earlier comment “Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism”.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote “We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)”.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the “Abstract” to 250 words to fit in eLife’s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLife’s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 1-4)”.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 6)”.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term “function” wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewer’s viewpoint. We have replaced the term “predatory genome” with a more realistic term “satellite genome” in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the “discussion” section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term “predatory genome” with “satellite genome” and revised the “discussion” section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewer’s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on “Immunofluorescence and FISH” (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term “predatory” genomes with “satellite” genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read “Horizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.” (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word “function” is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term “predatory” genomes with a more realistic term viz. “satellite” genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewer’s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows “However, defining the occurrence of HGT in mammals has been a challenge” (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads “The results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancer” (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads “Thus, “within-self” HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as “foreign” genetic elements” (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term “predatory” genome with “satellite” genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewers’ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and it’s “conclusion” section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The “Discussion” section has been greatly modified and parts of it has been rewritten.

    1. eLife Assessment

      This fundamental study reveals that aging in yeast leads to chromosome mis-segregation due to asymmetric partitioning of chromosomes, driven by disruption of the nuclear pore complex and pre-mRNA leakage. The findings are convincingly supported by carefully-designed experimental data with a combination of genetic, molecular biology and cell biology approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      Weaknesses:

      The authors have satisfactorily addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths:

      The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      Weaknesses:

      The authors have addressed my major concerns with experimentation or clarification.

    4. Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      Weaknesses:

      My main concerns have been thoroughly addressed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodelling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability. 

      Strengths: 

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging. 

      We thank the reviewer for this very positive assessment of our work

      Weaknesses: 

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work. 

      We thank the reviewer for bringing this point, which we have addressed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells. To examine its prevalence, we have quantified the combined loss frequency of two chromosomes when both are labelled in the same cell. Whereas single chromosomes are lost at a frequency of 10-15% per cell, less than 5% of the cells lose both at the same time.  Thus, the different chromosomes are lost largely but not fully independently from each other. Based on these data, and on the fact that yeast cells have 16 chromosomes, we evaluate that about half of the cells lose at least one chromosome in their final cell cycle.

      We also tried to estimate the prevalence of the pre-mRNA leakage phenotype, based on the increased mCherry to GFP ratio observed between 0h and 24 hours of aging for 146 individual cells. For this analysis, we compared the mCherry/GFP ratio at 0 and 24h for the same individual cell. This analysis indicates that 81% of the cells show a fold change strictly above 1 as they age. Furthermore, the data appears to be unimodal. Thus, we can conservatively conclude that a majority of the cells show premRNA leakage at 24 hours.  Since not all cells are at the end of their life at that time, this is possibly an underestimate.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging. 

      Genomic mis-segregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We have observed these events in our movies as well.  However, the chromosome loss phenotype that we are focusing on affects only some chromosomes (as discussed above) and takes place under proper elongation of the spindle, with one SPB remaining in the mother cell whereas the other one goes to the bud, as shown in the manuscript’s Figure 2.  In our movies, chromosome loss is at least three-fold more frequent (for a single chromosome) than full genome mis-segregation (Sup Fig 1A-B). Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic mis-segregation is not (Sup Fig 1B).  Thus, genomic mis-segregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  This discussion and the relevant data have been added to the manuscript.

      We thank the reviewer for bringing up the possible confusion between these two phenotypes, allowing us to clarify this point.

      Reviewer #2 (Public review): 

      Summary: 

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem. 

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past. 

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications. 

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  Actually, we specifically show that the cells are unlikely to show splicing defect (last figure of the original and the revised version of the manuscript). Our data specifically show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses: 

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.  

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we have now considered.  In the revised manuscript, we now provide measurements of mitotic duration in the “triple intron” mutant compared to wild type cells and the duration of their last cell cycle (See supplementary figure 3A-D). There is no evidence that removing these introns slows down mitosis.  Thus, the kinetic fix hypothesis is unlikely to explain our observation about the effect of intron removal.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7-∆i gene (see supplementary Figure 3EF).  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We now document these findings in supplementary figure 3.  

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1, or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences. 

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 7B are very clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 7B). The difference is highly significant.  Therefore, we are clearly not at the statistical limit.

      What the reviewer refers to is the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cell between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells, as the reviewer rightly concludes.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 7B is not because the old cells have much more dots in general (see supplementary Figure 4C).  We hope that these clarifications help understand the data better.  We have edited the manuscript to avoid confusion.

      Reviewer #3 (Public review): 

      Summary: 

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodelling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i. 

      Strengths: 

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained. 

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:  

      In some cases, controls for experiments were not presented or were depicted in other figures. 

      We are sorry about this confusion.  We have improved our presentation of the controls, bringing them back each time they are relevant.  We have also added those that were missing (such as those mentioned by reviewer 2, see above). Note that the frequencies of centromeric plasmid loss at 0h in Figure 1C is not meaningful and therefore not presented. Since the cells were grown on selective medium before loading on to the ageing chip, we cannot report a plasmid loss frequency here. The ageing experiments themselves were subsequently conducted in full medium, to allow for centromeric plasmid loss without killing the cell. We explain this in the materials and methods section.

      High variability was seen in chromosome loss data, leading to large error bars. 

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This is now corrected as follows.  We divided the available data into 4 cohorts and then plotted the average loss frequency across these cohorts for the indicated age groups.  This filters out much of the noise and improves the statistical resolution.

      The text could have been more polished. 

      Thank you for this comment.  We have gone through the manuscript again in detail.

      Reviewer #1 (Recommendations for the authors):

      (1) A previous study (PMID: 31714209). showed that aging yeast cells undergo genomic missegregation in which material was abnormally segregated to the daughter cells, leading to cell cycle arrest. After that, the missegregation is either corrected by returning aberrantly segregated genetic material to the mother cells so that they can resume cell cycles, or if not corrected, the mother cells will terminally exist the cell cycle and eventually die. That paper also showed that this agedependent genomic missegregation is related to rDNA instability. Is the chromosome loss in this work related to the genomic missegregation reported before? Is it partially reversible like genomic missegregation? Are all the chromosomes lost in one cell division, like in the case of genomic missegregation? Some additional characterization and a discussion would be helpful. 

      As mentioned above, indeed the phenotype of full genome mis-segregation described by Crane et al. (2019) is observable in our data as well. At 24h ~3% of the cells segregate both SPBs to the bud, as they previously described (Supp Figure 1A and B).  This phenomenon is clearly distinct from asymmetric chromosome partition, where cells undergo anaphase, separate the SPBs and segregate one to the mother cell and one to the bud (Figure 2A).  Also, asymmetric chromosome partitioning affects only a subset of the chromosomes (see below), not the entire genome. Finally, unlike asymmetric chromosome partitioning, the frequency of genome mis-segregation in ageing was not alleviated by intron removal (Supp Figure 1B). Thus, these two processes are clearly distinct and driven by different mechanisms. Note that asymmetric chromosome partitioning appears 3 to 5 times more frequently than genomic mis-segregation.

      Supporting further the notion that these two processes are distinct, chromosome loss seals the end of the life of the cell, as we reported, indicating that this is not a reversible event.  Also, it does not involve all chromosomes at once. Cells that contain the labelled versions of both chromosome II and IV at the same time, the loss frequency of both chromosomes is less than 5%, whereas each chromosome is lost in 10-15% of the cells (Figure 1C). Thus, most cells lose one and keep the other. Furthermore, this indicates that there are many more cells losing at least one chromosome than the 15% that lose chromosome IV for example, probably 50% or more.  Thus, chromosome loss by asymmetric segregation is much more frequent than the partly transient transfer of the entire nucleus to the bud.

      (2) What percentage of aging WT cells undergo pre-mRNA leakage (using the GFP/mCherry reporter) during their entire lifespan? Is it a sporadic, reversible process or an accumulative, one-way deterioration? Previous studies (PMID: 32675375; PMID: 24332850; PMID: 36194205; PMID: 31291577) showed that only a fraction of yeast cells age with rDNA instability and ERC accumulation, as indicated by excessive rRNA transcription and nucleolar enlargement. Are they the same fraction of aging cells that undergo pre-mRNA leakage and chromosome loss? This information will indicate the prevalence of the key aging phenotypes reported in this work and should be readily obtainable from microfluidic experiments. In addition, a careful discussion would be helpful. 

      Pre-mRNA leakage is relatively widespread in the population, but it is difficult to put a precise number on it. Analysis of how the mCherry/GFP ratio changes in 146 individual cells between 0 and 24 hours and imaging in our microfluidics platform indicates that ~80% show an increase and 50% of the cells show an increase above 1.5-fold. Therefore, the frequencies of pre-mRNA leakage and chromosome loss are probably similar.  We have modified the discussion to account for these considerations.  This would be in the same range as the frequency of aging by ERC accumulation (mode 1) estimated by PMID: 32675375. 

      Reviewer #2 (Recommendations for the authors)

      The manuscript could use a bit of editing in places - please go through it once more. 

      Editing suggestions: 

      Line 80 – irrespective

      Corrected.

      Line 97 - these are not "rates" but frequencies. Please correct this error throughout. 

      Replaced “rate” with “frequency throughout the manuscript and the figures, when pertaining to chromosome loss

      Line 328 - increase in chromosome... 

      Corrected.

      Line 379 - tampering 

      Reviewer #3 (Recommendations for the authors):

      Specific Feedback to Authors 

      (a) Major Points 

      (i) While the proposed connection between ERC-mediated nuclear basket removal and erroneous error correction was clearly stated, this connection is correlative and was not directly tested. Specifically, although mutants impacting ERC levels were tested for missegregation, it was not directly tested if increased missegregation levels occurred due to ERC tethering to the NPC and subsequent nuclear basket removal. It is possible that the increased ERCs may be driving missegregation via a different pathway. Authors should consider experiments to strengthen this idea, such as looking at chromosome loss frequency in a sir2∆ 3x∆i double mutant, or a sir2∆ sgf73∆ double mutant. 

      This connection is addressed in the original version of the manuscript, where we show that preventing attachment of ERCs to the NPC, by removing the linker protein Sgf73, alleviates chromosome loss.  The link is further substantiated by the fact that removing the basket on its own promote chromosome loss and that in both cases, namely during normal aging, i.e., upon ERC accumulation, and upon basket removal the mechanism of chromosome loss is the same.  In both cases, it depends on the introns of the GLC7, MCM21 and NBL1 genes.  

      However, we acknowledge that the mutants tested have pleiotropic effects, making interpretation somewhat difficult, even when examining chromosome loss in multiple mutants that affect ERC formation and NPC remodelling, as we have done.  As recommended by the reviewer, we have characterized the phenotype of the sir2∆ 3x∆i mutant strain. Intron removal in the sir2∆ mutant cells largely rescued the elevated chromosome loss frequency of these cells and slightly extended their replicative lifespan (Figure 6D-E). We conclude that intron removal can remedy the chromosome loss phenotype of the sir2∆. Although clearly significant, the effect on the replicative lifespan was not very strong, likely due to the sir2∆ affecting other ageing processes.

      Touching on this question, we added a new set of experiments asking whether any accumulating DNA circle causes chromosome loss in an intron-dependent manner.  Thus, we have introduced a noncentromeric replicative plasmid in wild type and 3x∆i mutant strains carrying the labelled version of chromosome II (Figure 6A-C).  These studies show that these cells age much faster than wild type cells, as expected, and lose chromosomes at a higher frequency than non-transformed cells.  Finally, the effect is at least in part alleviated by removing the introns of NBL1, MCM21 and GLC7.

      Therefore, after adding this new and more direct test of the role of DNA circles in chromosome loss, we are confidently concluding that ERC-mediated basket removal is the trigger of chromosome loss in old cells.

      (b) Minor Points 

      (i) In Figure 1C, the text (lines 91-92) argues that chromosome loss happens abruptly as cells age; however the data only show loss at young and old time points, not an intermediate, which leaves open the possibility that chromosome loss is occurring gradually. While cells that lost chromosomes should fail to divide further, we don't know if these events happened and were simply excluded.

      We agree with the reviewer that formally the conclusion drawn in the lines 91-92 (of the original manuscript), namely that chromosome loss takes place abruptly as cells age, cannot be drawn from the Figure 1C alone but only from subsequent observations. However, since chromosome loss is lethal in haploid, as we mention in the text and the reviewer notes as well, it is difficult to envision how cells could lose chromosomes before the end of their lifespan and must therefore increase abruptly as the cells reach that point.  This is now underlined in the revised version of the manuscript. Accordingly, the frequency of chromosome loss per age group, which is depicted in Figure 3A, shows that the wild type cells that have budded less than 10 times show no chromosome loss. The chromosome loss frequency starts to ramp up only pass that point. Therefore, chromosome loss does not increase linearly with age.

      Additionally, cells that lost minichromosome should not arrest. We suggest that the interpretation of these data should be softened in the text, or that chromosome loss fraction could be more effectively portrayed as a Kaplan-Meier survival curve depicting cells that have not lost chromosomes, if these data are easily available. Or, chromosome loss at an intermediate time point could be depicted. 

      Since we cannot visualize more than 2 chromosomes at a time, it is not possible to plot the KaplanMeier curve of cells that have not lost chromosomes. However, as mentioned above, the chromosome loss frequencies at intermediate time points are depicted in Figure 3A and Figure 4B and shows that it increases with age.

      (ii) Also regarding Figure 1, it would be helpful to expound on the purpose of the minichromosomes, as well as how the Ubi-GFP minichromosome is constructed. 

      We now explained why we tested the loss of minichromosome, namely, as a mean to test whether the centromere is necessary and sufficient to drive the loss of the genetic material linked to it, i.e., chromosomes, in old cells.  Concerning the Ubi-GFP minichromosome, the Materials and methods section is now updated and reports plasmid construction, backbone used, primers as well as the plasmid sequence being available in the supplementary data.

      The purpose of the minichromosome initially appears to be the engineering of an eccDNA (ERC) with a CEN to demonstrate distinct behaviour, but it is unclear whether this was actually conducted or if the minichromosome are simply CEN plasmids and/or if this was the intended goal. Furthermore, lines 102-103 state that the presence of a centromere was necessary and sufficient for minichromosome loss. However, since no constructs lacking a centromere were tested, necessity cannot be concluded. Please clarify this in the text and include experimental details to help readers understand what was tested. 

      We apologize for having been too short here. The behaviour of the CEN-less version of this plasmid has been characterized in detail in previous studies (Shcheprova et al., 2008; Denoth-Lippuner 2014, Meinema et al 2022). Here we focused on the behaviour of the CEN+ version of an otherwise Identical plasmid.  We now clarify in the text that this plasmid is retained in the mother cell when CEN-less and cite the relevant literature. 

      (iii) It is unclear how cells at 0-3 budding events were identified in assays using the microfluidics platform. Can the authors clarify the known "age" of the cells once captured, i.e. how do the authors know how many divisions a cell has undergone prior to capture? 

      The reviewer is right; we do not know the exact age of these cells.  However, in any asynchronous population of yeast cells, which is what we start from, 50% of the cells are newborn daughters, 25% have budded once, 12.5 have budded twice, 6.25 % have budded three times…  Therefore, at the time of loading, 93% of the cells have budded between 0 and 3 times.  For this reason, we report to this population as cells age 0-3 CBE. We acknowledge that this is an approximation, but it remains a relatively safe one.  

      (iv) While the schematic in Figure 2D is generally helpful, a different depiction of the old and new SPBs would be beneficial in cases where the new SPB and TetR-GFP are depicted as colocalized, it is difficult to see that the red is fainter for the new SPB. 

      We have corrected this issue by completely separating the SPB and the Chromosome signals in the Figure 2D.

      (v) In Figure 2F, the grey colour of the 12h Ipl1-321 data bar did not have high enough contrast when the manuscript was printed-would recommend changing this to a darker shade. 

      We have corrected this issue by using a darker shade of grey.

      (vi) In Figure 3A, 'Budding' is misspelled on X-axis label  

      We have corrected this error.

      (vii) In Figure 4, the authors should clarify the differences between the analyses in panels B and C. The distinction is not immediately clear and may be difficult to grasp upon initial reading. 

      We have corrected this issue in the main text as well as figure legend.

      (viii) In Figure 5, It would aid comparisons to depict the 3x∆i only as well on panels B, D, and E. 

      We have added 3x∆i data to Figure 5,6 and 8.

      (ix) In Figure 6D, it is unclear why there was an appreciable level of unspliced RNA in the wild-type and sir2∆ young cells. Additionally, it is unclear why there is so much signal observed in the Merge image for the old wild-type cell, especially regarding the apparent bright spot. Is that nuclear signal? Please clarify. 

      The pre-mRNA processing reporter is not very efficiently spliced. It was selected as such during design (Sorenson et al 2014; DOI: 10.1261/rna.042663.113) to provide sensitivity. As for the bright spot occurring, translation of the unspliced reporter produces the N-terminal part of a ribosomal protein, a fraction of which forms some sort of nuclear aggregate in a fraction of the population. 

      (x) In Figure 6E, why does the sir2∆ exhibit higher mCherry/GFP than the wild-type and fob1∆ at "young age"? Is this due to disrupted proteostasis in the sir2∆, or a different pleiotropic effect of sir2∆? Please comment on this observation in the text.

      Indeed, as we have stated in the text the sir2∆ mutation already perturbs pre-mRNA processing in young cells. We do not know the reason of this but indeed it is most probably reflective of its pleiotropic function. Following the reviewer’s request, we now state this in the text. For example, Sir2 may regulate the acetylation state of the basket itself.  The genetic interactions observed between sir2∆ and quite a few nucleoporin mutations seem to support this possibility. 

      (xi) Throughout, the authors switch between depicting aging in Completed Budding Events versus hours, which made it difficult to compare data across figures

      Ideally, all the data in this manuscript should be plotted according to the CBE age of the cell. To ensure that the major findings are plotted in such a way, we have done so for over ~3000 combined cells and thousands of replicative divisions in Figures 3,5-7. All the measurements of chromosome loss at a specific CBE had to be done manually, due to the absence of algorithms that would be able to accurately detect chromosome loss and replicative age. Therefore, doing this for the entirety of our dataset, encompassing well over 50 ageing chips and tens of thousands of cells is not easily doable at this stage. 

      (xii) Typo on line 12 (Sindle Pole Body) 

      We have corrected this error.

      (xiii) The phrase should be 'chromosome partitioning' rather than 'chromosome partition', throughoutfor example, line 17 

      Replaced “chromosome partition” with “chromosome partitioning” throughout the text.

      (xiv) There are inconsistencies between plural and singular references throughout sentences-example, lines 35-37, and lines 44-45. 

      We carefully combed through the manuscript again and hope that we caught all inconsistencies.

    1. eLife Assessment

      This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is compelling. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle is important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be related to hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamical systems.

      Weaknesses:

      (1) Connecting structure and function.<br /> In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select for a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.<br /> I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck you impose on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      (4) Consideration of environmental stochasticity.<br /> The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      (5) Assumption about mutation rates<br /> If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      (6) Minor points<br /> In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.<br /> In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

      Comments on revisions:

      I thank the authors for addressing many points raised by the reviewers. Overall, the readability of the manuscript has improved with more context provided around why they were solving this specific problem. However, I've found many of the responses to be too terse. It would have been nicer if there had been more discussion and description of the thought process that led up to the conclusions they made for each comment or question. Instead, many of the responses only showed the screenshot of the text they added.

      Most of my comments or questions were answered. Below are my comments on some of the authors' responses.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> In the Abstract and Introduction, you've added more sentences about the intra-collective or inter-collective selection. However, these are either making analogies to the waterfall or just describing the result of the intra/inter-collective selection. I would still appreciate a proper definition of those terms, which is paramount for readers to understand the entire paper.

      (4) Consideration of environmental stochasticity.<br /> I think providing the reason 'why' the paper focuses on demographic stochasticity and not environmental stochasticity will greatly justify the paper's work. For example, citing papers that actually performed artificial selection and pointing out that your model captures the stochasticity from those kinds of experiments would be great.

      (5) Assumption about mutation rates.<br /> It would be great if you could add a citation in the added sentence to support your claim: "This scenario is encountered in biotechnology: .....".

    3. Reviewer #3 (Public review):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such difference in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) large.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      The phenomenon the authors characterize is ecological in nature, though it is maintained even when switching between types is possible. Calling this dynamics community evolution reflects a widespread ambiguity in the field, not ascribable just to this work.

      Although different types compete for being represented in the next generation's propagules, within-generation ecology is here representative of exponential growth. As species interactions are commonly manifest in lab serial dilution experiments, it would be interesting if future work explores the extent of the robustness of these results to density-dependent demography.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Common comments

      (1) Significance of zero mutation rate

      Reviewers asked why we included mutation rate even though setting mutation rate to zero doesn’t change results. We think that including non-zero mutation rate makes our results more generalisable, and thus is a strength rather than weakness. To better motivate this choice, we have added a sentence to the beginning of Results:

      (2) Writing the mu=0 case first

      Reviewers suggested that we should first focus on the mu=0 case, and then generalize the result. The suggestions are certainly good. However, given the large amount of work involved in a re-organization, we have decided to adhere to our current narrative. However, we now only include equations where mu=0 in the main text, and have moved the case of nonzero mutation rate to Supplementary Information.

      (3) Making equations more accessible

      We have taken three steps to make equations more readable.

      ● Equations in the main text correspond to the case of zero-mutation rate.

      ● The original section on equation derivation is now in a box in the main text so that readers have the choice of skipping it but interested readers can still get a gist of where equations came from.

      ● We have provided a much more detailed interpretation of the equation (see page 10).

      (4) Validity of the Gaussian approximation

      Reviewers raised concerns about the validity of Gaussian approximation on F frequency𝑓(𝜏). The fact that our calculations closely match simulations suggest that this approximation is reasonable. Still, we added a discussion about the validity of this approximation in Box 1.

      We also added to SI with various cases of initial S and F sizes. This figure shows that when either initial S or initial F is small, the distribution of𝑓(𝜏) is not normal. However, if initial S and F are both on the order of hundreds, then the distribution of 𝑓(𝜏) is approximately Gaussian.

      Public Reviews:

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle are important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamic systems.

      We have added this clarification in the main text:

      “Note that here, selection outcome is path-dependent in the sense of being sensitive to initial conditions. This phenomenon is distinct from hysteresis where path-dependence results from whether a tuning parameter is increased or decreased.

      Weaknesses:

      (1) Connecting structure and function

      In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main Figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      We have added this to introduction: “This is a common quest: whenever a collective function depends on both populations, collective function is maximised, by definition, at an intermediate frequency (e.g. too little of either population will hamper function [23]).”

      (2) Explain intra-collective and inter-collective selection better for readers.

      The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      This is a great point. We have added in Abstract:

      “Such collective selection is dictated by two opposing forces: during collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Due to this model structure, maintaining a target frequency requires the continued action of inter-collective selection.”

      and in Introduction

      “A selection cycle consists of three stages (Fig. 1). During collective maturation, intra-collective selection favors fast-growing individuals within a collective. At the end of maturation, inter-collective selection acts on collectives and favors those achieving the target composition. Finally during collective reproduction, offspring collectives sample stochastically from the parents, a process dominated by genetic drift.”

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.

      I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main Figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck is imposed on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      We have added an analytical approximation for N0˘, the Newborn size below which all target frequencies can be achieved in SI.

      Also, we have added lines indicating N0˘ in Fig4a.

      (4) Consideration of environmental stochasticity.

      The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      You are correct that our work considers only demographic stochasticity.

      Indeed, considering other types of stochasticity will be an exciting future research direction. We added in the main text:

      “Overall our model considers mutational stochasticity, as well as demographic stochasticity in terms of stochastic birth and stochastic sampling of a parent collective by offspring collectives. Other types of stochasticity, such as environmental stochasticity and measurement noise, are not considered and require future research.”

      (5) Assumption about mutation rates

      If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      This is a great point. We have added this to the beginning of Results to better motivate our study:

      “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations. This scenario is encountered in biotechnology: an engineered pathway will slow down growth, and breaking the pathway (and thus faster growth) is much easier than the other way around. When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.

      See answer on common question 1.

      (6) Minor points

      In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.

      We added a description in caption 3b.

      In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

      We increased the resolution of Fig 5b so that the gold region near FF is more visible.

      We have added Fig 5c and the following explanation to the main text:

      “From numerical simulations, we identified two accessible regions: a small region near FF and a band region spanning from S to F (gold in Fig. 5b i). Intuitively, the rate at which FF grows faster than S+F is greater than the rate at which F grows faster than S (see section VIII in Supplementary Information). Thus, the problem can initially be reduced to a two-population problem (i.e. FF versus F+S; Fig. 5c left), and then expanded to a three-population problem (Fig. 5c right).”

      Recommendations For The Authors

      Since the conclusion of the model greatly depends on the noise (variation) of F and S in the Gaussian distribution, it would be nice to have a plot where the y-axis is the variation in terms of frequency and the x-axis is the s_0 or f_0 (frequency). In the plot, I would love to see how the variation in the frequency depends on the initial frequency of S and F. Maybe this is just trivial.

      In the SI, we added Fig6a, as per your request. Previous Fig6 became Fig6b.

      Reviewer #2 (Public review):

      The authors provide an analytical framework to model the artificial selection of the composition of communities composed of strains growing at different rates. Their approach takes into account the competition between the targeted selection at the level of the meta-community and the selection that automatically favors fast-growing cells within each replicate community. Their main finding is a tipping point or path-dependence effect, whereby compositions dominated by slow-growing types can only be reached by community-level selection if the community does not start and never crosses into a range of compositions dominated by fast growers during the dynamics.

      These results seem to us both technically correct and interesting. We commend the authors on their efforts to make their work reproducible even when it comes to calculations via extensive appendices, though perhaps a table of contents and a short description of these appendices at the start of SI would help navigate them.

      Thank you for the suggestion. We have added a paragraph at the beginning of SI.

      The main limitation in the current form of the article is that it could clarify how its assumptions and findings differ from and improve upon the rest of the literature:

      -  Many studies discuss the interplay between community-level evolution and species- or strain-level evolution. But "evolution" can be a mix of various forces, including selection, drift/randomness, and mutation/innovation.

      - This work's specificity is that it focuses strictly on constant community-level selection versus constant strain-level selection, all other forces being negligible (neither stochasticity nor innovation/mutation matter at either level, as we try to clarify now).

      Note that intra-collective selection is not strictly “constant” in the sense that selection favoring F is the strongest at intermediate F frequency (Fig 3). However, we think that you mean that intra- and inter-collective selection are present in every cycle, and this is correct for our case, and for community selection in general.

      -  Regarding constant community-level selection, it is only briefly noted that "once a target frequency is achieved, inter-collective selection is always required to maintain that frequency due to the fitness difference between the two types" [pg. 3 {section sign}2]. In other words, action from the selector is required indefinitely to maintain the community in the desired state. This assumption is found in a fraction of the literature, but is still worth clarifying from the start as it can inform the practical applicability of the results.

      This is a good point. We have added to abstract:

      “Such collective selection is dictated by two opposing forces: during collective maturation, intra-collective selection acts like a waterfall, relentlessly driving the S-frequency to lower values, while during collective reproduction, inter-collective selection resembles a rafter striving to reach the target frequency. Due to this model structure, maintaining a target frequency requires the continued action of inter-collective selection.”

      - More importantly, strain-level evolution also boils down here to pure selection with a constant target, which is less usual in the relevant literature. Here, (1) drift from limited population sizes is very small, with no meaningful counterbalancing of selection, (2) pure exponential regime with constant fitness, no interactions, no density- or frequency-dependence, (3) there is no innovation in the sense that available types are unchanging through time (no evolution of traits such as growth rate or interactions) and (4) all the results presented seem unchanged when mutation rate mu = 0 (as noted in Appendix III), meaning that the conclusions are not "about" mutation in any meaningful way.

      With regard to point (1), Figure 4a (reproduced below) shows how Newborn size affects the region of achievable targets. Indeed at large Newborn size (e.g. 5000 and above), no target frequency is achievable (since drift is too small to generate sufficient inter-community variation and consequently all communities are dominated by fast-growing F). However at Newborn size of for example 1000, there are two regions of accessible target frequencies. At smaller Newborn size, all target frequencies become achievable due to drift becoming sufficiently strong.

      With regard to points (2) and (3), we have added to Introduction

      “To enable the derivation of an analytical expression, we have made the following simplifications.

      First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult.”

      With regard to point (4), we view this as a strength rather than weakness. We have added the following to the beginning of Results and Discussions:

      “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations.”

      “When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.”

      See Point 1 of Common comments.

      - Furthermore, the choice of mutation mechanism is peculiar, as it happens only from slow to fast grower: more commonly, one assumes random non-directional mutations, rather than purely directional ones from less fit to fitter (which is more of a "Lamarckian" idea). Given that mutation does not seem to matter here, this choice might create unnecessary opposition from some readers or could be considered as just one possibility among others.

      We have added the following justification:

      “This scenario is encountered in biotechnology: an engineered pathway will slow down growth, and breaking the pathway (and thus faster growth) is much easier than the other way around.”

      It would be helpful to have all these points stated clearly so that it becomes easy to see where this article stands in an abundant literature and contributes to our understanding of multi-level evolution, and why it may have different conclusions or focus than others tackling very similar questions.

      Finally, a microbial context is given to the study, but the assumptions and results are in no way truly tied to that context, so it should be clear that this is just for flavor.

      We have deleted “microbial” from the title, and revised our abstract:

      Recommendations For The Authors

      (1) More details concerning our main remark above:

      - The paragraph discussing refs [24, 33] is not very clear in how they most importantly differ from this study. Our impression is that the resource aspect is not very important for instance, and the main difference is that these other works assume that strains can change in their traits.

      We are fairly sure that resource depletion is important in Rainey group’s study, as the attractor only evolved after both strains grew fast enough to deplete resources by the end of maturation. Indeed, evolution occurred in interaction coefficients which dictate the competition between strains for resources.

      Regardless, you raised an excellent point. As discussed earlier, we have added the following:

      “To enable the derivation of an analytical expression, we have made the following simplifications.

      First, growth is always exponential, without complications such as resource limitation, ecological interactions between the two populations, or density-dependent growth. Thus, the exponential growth equation can be used. Second, we consider only two populations (genotypes or species): the fast-growing F population with size F and the slow-growing S population with size S. We do not consider a spectrum of mutants or species, since with more than two populations, an analytical solution becomes very difficult.”

      - We would advise the main text to focus on mu = 0, and only say in discussion that results can be generalized.

      Your suggestion is certainly good. However, given the large amount of work involved in a reorganisation, we have decided to adhere to our current narrative. However, as discussed earlier, we have added this at the beginning of Results to help orient readers:

      “We will start with a complete model where S mutates to F at a nonzero mutation rate µ. We made this choice because it is more challenging to attain or maintain the target frequency when the abundance of fast-growing F is further increased via mutations.”

      “When the mutation rate is set to zero, the same model can be used to capture collectives of two species with different growth rates.”

      (2) We think the material on pg. 5 "Intra-collective evolution is the fastest at intermediate F frequencies, creating the "waterfall" phenomenon", although interesting, could be presented in a different way. The mathematical details on how to find the probability distribution of the maximum of independent random variables (including Equation 1) will probably be skipped by most of the readers (for experienced theoreticians, it is standard content; for experimentalists, it is not the most relevant), as such I would recommend displacing them to SM and report only the important results.

      This is an excellent suggestion. We have put a sketch of our calculations in a box in the main text to help orient interested readers. As before, details are in SI.

      Similarly, Equations 2, 3, and 4 are hard to read given the large amount of parameters and the low amount of simplification. Although exploring the effect of the different parameters through Figures 3 and 4 is useful, I think the role of the equations should be reconsidered:

      i. Is it possible to rewrite them in terms of effective variables in a more concise way?

      See Point 3 of Common comments.

      ii. Is it possible to present extreme/particular cases in which they are easier to interpret?

      We have focused on the case where the mutation rate is zero. This makes the mathematical expressions much simpler (see above).

      (3) Is it possible to explain more in detail why the distribution of f_k+1 conditional to f_k^* is well approximated by a Gaussian? Also, have you explored to what extent the results would change if this were not true (in light of the few universal classes for the maximum of independent variables)?

      Despite the appeal to the CLT and the histograms in the Appendix suggesting that the distribution looks a bit like a Gaussian at a certain scale, fluctuations on that scale are not necessarily what is relevant for the results - a rapid (and maybe wrong) attempt at a characteristic function calculation suggests that in your case, one does not obtain convergence to Gaussians unless we renormalize by S(t=0) and F(t=0), so it seems there is a justification missing in the text as is for the validity of this approximation (or that it is simply assumed).

      See point 4 of Common comments.

      Reviewer #3 (Public Reviews):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such differences in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but the collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) larger.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection, but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      Thanks so much for these positive comments.

      I however found the description of the results too succinct and I think that more could be done to unpack the mathematical results in a way that is understandable to a broader audience. Moreover, the phenomenon the authors characterize is of purely ecological nature. Here, mutations of the growth rate are, in my understanding, neither necessary (non-trivial equilibria can be maintained also when \mu =0) nor sufficient (community-level selection is necessary to keep the system far from the absorbing state) for the phenomenon described. Calling this dynamics community evolution reflects a widespread ambiguity, and is not ascribable just to this work. I find that here the authors have the opportunity to make their message clearer by focusing on the case where the 'mutation' rate \mu vanishes (Equations 39 & 40 of the SI) - which is more easily interpretable, at least in some limits - while they may leave the more general equations 3 & 4 in the SI.

      See points 1-4 of Common comments.

      Combined with an analysis of the deterministic equations, that capture the possibility of maintaining high frequencies of fast growers, the authors could elucidate the dynamics that are induced by the presence of a second level of selection, and speculate on what would be the result of real open-ended evolution (not encompassed by the simple 'switch mutations' generally considered in evolutionary game theory), for instance discussing the invasibility (or not) of mutant types with slightly different growth rates.

      Indeed, evolution is not restricted to two types. However, our main goal here is to derive an analytical expression, and it was difficult for even two types. For three-type collectives, we had to resort to simulations. Investigating the case where fitness effects of mutations are continuously distributed is beyond the scope of this study.

      The single most important model hypothesis that I would have liked to be discussed further is that the two types do not interact. Species interactions are not only essential to achieve inheritance of composition in the course of evolution but are generally expected to play a key role even on ecological time scales. I hope the authors plan to look at this in future work.

      In our system, the S and F do interact in a competitive fashion: even though S and F are not competing for nutrients (which are always in excess), they are competing for space. This is because a fixed number of cells are transferred to the next cycle. Thus, the presence of F will for example reduce the chance of S being propagated. We have added this clarification to our main text:

      “Note that even though S and F do not compete for nutrients, they compete for space: because the total number of cells transferred to the next cycle is fixed, an overabundance of one population will reduce the likelihood of the other being propagated.”

      Recommendations For The Authors

      I felt the authors could put some additional effort into making their theoretical results meaningful for a population of readers who, though not as highly mathematically educated as they are, can nonetheless appreciate the implications of simple relations or scaling. Below, you find some suggestions:

      (1) In order to make it clear that there is a 'natural' high-frequency equilibrium that can be reached even in the absence of selection, the authors could examine first the dynamics of the deterministic system in the absence of mutations, and use its equilibria to elucidate the combined role of the 'fitness' difference \omega and of the generation duration \tau in setting its value. The fact that these parameters always occur in combination (when there are no mutations) is a general and notable feature of the stochastic model as well. Moreover, this model would justify why you only focus on decreasing the frequency in the new generation.

      Note that the ‘natural’ high-frequency equilibrium in the absence of collective selection is when fast grower F becomes fixed in the population. Following your suggestion, we have introduced two parameters 𝑅τ and 𝑊τ to reflect the coupling between ‘fitness’ and ‘generation duration’:

      (2) Since the phenomenon described in the paper is essentially ecological in nature (as the author states, it does not change significantly if the 'mutation rate' \mu is set to zero), I would put in the main text Equations 39 & 40 of the SI in order to improve intelligibility.

      See Point 2 at the beginning of this letter.

      These equations can be discussed in some detail, especially in the limit of small f^*_k, where I think it is worth discussing the different dependence of the mean and the variance of the frequency distribution on the system's parameters.

      This is a great suggestion. We have added the following:

      “In the limit of small , Equation (3) becomes f while Equation (4) becomes . Thus, both Newborn size (N<sub>0</sub>) and fold-change in F/S during maturation (W<sub>τ</sub>) are important determinants of selection progress.

      (3) I would have appreciated an explanation in words of what are the main conceptual steps involved in attaining Equation 2, the underlying hypotheses (notably on community size and distributions), and the expected limits of validity of the approximation.

      See points 3 and 4 at the beginning of this letter.

      (4) I think that some care needs to be put into explaining where extreme value statistics is used, and why is the median of the conditional distribution the most appropriate statistics to look at for characterizing the evolutionary trajectory (which seems to me mostly reliant on extreme values).

      Great point! We added an explanation of using median value in Box 1.

      and also added figure 7 to explaining it in SI.

      Showing in a figure the different distributions you are considering (for instance, plotting the conditional distribution for one generation in the trajectories displayed in Figure 2) would be useful to understand what information \bar f provides on a sequence of collective generations, where in principle there may be memory effects.

      Thanks for this suggestion. We have added to Fig 2d panel to illustrate the shape and position of F frequency distributions in each step in the first two selection cycles.

      (5) Similarly, I do not understand why selecting the 5% best communities should push the system's evolution towards the high-frequency solution, instead of just slowing down the improvement (unless you are considering the average composition of the top best communities - which should be justified). I think that such sensitivity to the selection intensity should be appropriately referenced and discussed in the main text, as it is a parameter that experimenters are naturally led to manipulate.

      In the main text, we have added this explanation:

      “In contrast with findings from an earlier study [23], choosing top 1 is more effective than the less stringent “choosing top 5%”. In the earlier study, variation in the collective trait is partly due to nonheritable factors such as random fluctuations in Newborn biomass. In that context, a less stringent selection criterion proved more effective, as it helped retain collectives with favorable genotypes that might have exhibited suboptimal collective traits due to unfavorable nonheritable factors. However, since this study excludes nonheritable variations in collective traits, selecting the top 1 collective is more effective than selecting the top 5% (see Fig. 11 in Supplementary Information).”

      (6) Equation 1 could be explained in simpler terms as the product between the probability that one collective reaches the transmitted value times the probability that all others do worse than that. The current formulation is unclear, perhaps just a matter of English formulation.

      We have revised our description to state:

      “Equation (1) can be described as the product between two terms related to probability: (i) describes the probability density that any one of the g Adult collectives achieves f given , and (ii) describes the probability that all other g – 1 collectives achieve frequencies above f and thus not selected.”

      (7) I think that the discussion of the dependence of the boundaries of the 'waterfall' region with the difference in growth rate \omega is important and missing, especially if one wants to consider open-ended evolution of the growth rate - which can occur at steps of different magnitude.

      We added a new chapter and figure in supplementary information on the threshold values when \omega varies. As expected, smaller \omega enlarges the success area.

      We have also added a new figure panel to show how maturation time affects selection efficacy.

      (8) Notations are a bit confusing and could be improved. First of all, in most equations in the main text and SI, what is initially introduced as \omega appears as s. This is confusing because the letter s is also used for the frequency of the slow type.

      The letter S is used to denote an attribute of cells (S cells), the type of cells (Equations 1-3 of the SI) and the number of these cells in the population, sometimes with different meanings in the same sentence. This is confusing, and I suggest referring to slow cells or fast cells instead (or at least to S-cells and F-cells), and keeping S and F as variables for the number of cells of the two types.

      All typos related to the notation have been fixed. We use S and F as types, and S and F (italic) and population numbers.

      (9) On page 3, when introducing the sampling of newborns as ruled by a binomial distribution, the information that you are just transmitting one collective is needed, while it is conveyed later.

      We have added this emphasis:

      “At the end of a cycle, a single Adult with the highest function (with F frequency f closest to the target frequency ) is chosen to reproduce g Newborn collectives each with N<sub>0</sub> cells (‘Selection’ and ’Reproduction’ in Fig. 1).”

      (10) I found that the abstract talks too early about the 'waterfall' phenomenon. As this is a concept introduced here, I suggest the authors first explain what it is, then use the term. It is a useful metaphor, but it should not obscure the more formal achievements of the paper.

      We feel that the “waterfall” analogy offers a gentle helping hand to orient those who have not thought much about the phenomenon. We view abstract as an opportunity to attract readership, and thus the more accessible the better.

      (11) In the SI there are numerous typos and English language issues. I suggest the authors read carefully through it, and add line numbers to the next version so that more detailed feedback is possible.

      Thank you for going through SI. We have gone through the SI, and fixed problems.

    1. eLife Assessment

      In this important quantitative study of HIV-1 evolution in humans and rhesus macaques, selection coefficients are inferred at scale over the HIV genome. Selection coefficients are similar in humans and macaques, providing compelling evidence that these coefficients are representative of the fitness landscapes of these viruses within hosts. This work will be of interest to the community working on quantitative evolution and fitness landscape inference, and the finding that rapid fitness gains in the HIV population predict bNAb emergence has significant implications for HIV vaccine design.

    2. Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They inferred selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      The manuscript is well written and organized.

      Comments on revisions:

      In their revised version the authors have addressed most of these points satisfactorily.

    3. Reviewer #2 (Public review):

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 3.

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      The fitness landscape of env in multiple hosts is immensely valuable especially because of how often SHIV has used as proxy for HIV. The strength of reversion-to-consensus selection is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus-macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. All of high interest to HIV researchers.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      Strength of evidence:

      Equation 3 is a beautiful and intuitive tool that accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. They have addressed my earlier concerns the effects of incomplete observations of the frequency bias fitness inference on rare sites.

      Whether the fact that fitness increases occured before or after the presence of the bnab remains incompletely known. bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Overall this is a convincing paper. It is a valuable introduction to a practical method of fitness inference at the scale of the entire env gene and how this information can be leveraged to learn some interesting biology.

    4. Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simian-human immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance. In the revised version, the Authors included a simple but clear explanation of the statistical method for inferring the model's parameters in the main text. Moreover, I find the potential implications of the methodology absent in the original submission very interesting.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They infer selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      Weaknesses:

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis.

      As suggested, we have now addressed this limitation by inferring epistatic fitness landscapes for CH505, CH848, SHIV.CH505, and SHIV.CH848. Indeed, the computational burden of the epistasis inference procedure was one constraint that motivated us to consider only additive fitness in the previous version of our paper. The original approach developed by Sohail et al. (2022) tested only sequences with <50 sites due to this limitation, far smaller than the ones we consider. Beyond this computational constraint, we also believed that 1) an additive fitness model may suffice to capture local fitness landscapes, and practically, 2) epistatic interactions are more challenging to validate than the effects of individual mutations, making the interpretation of the model more complex.

      However, after performing the analyses described in this paper, we developed a new approach for identifying epistatic interactions that can scale to much longer sequences (Shimagaki et al., Genetics, in press). We therefore applied this method to infer an epistatic fitness landscape for the HIV and SHIV data sets that we studied. As in that work, we focused on short-range (<50 bp) interactions which we could more confidently estimate from data. We have added a section in the SI describing the epistatic fitness model and our analysis. 

      Overall, we found substantial agreement between the epistatic and purely additive models in terms of the estimated fitness effects of individual mutations (new Supplementary Fig. 8) and overall fitness (Supplementary Fig. 9). Consistent with our prior work, we did not find substantial evidence for very strong epistatic interactions (Supplementary Fig. 10). This does not necessarily mean that strong epistatic interactions do not exist; rather, this shows that strong interactions don’t substantially improve the fit of the model to data, and thus many are regularized toward zero. While the biological validation of epistatic interactions is challenging, we found that the largest epistatic interactions, which we defined as the top 1% of all shortrange interactions, were modestly but significantly enriched in the CD4 binding site, V1 and V5 regions for CH505 and in the CD4 binding site, V4, and V5 for CH848. In addition, mutation pairs N280S/V281A and E275K/V281G, which confer resistance to CH235, ranked in the top 15% of all epistatic interactions in CH505.

      We have now included an additional section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which discusses our epistatic analyses (page 6, lines 415-464), along with the above Supplementary Figures and a technical section in the SI summarizing the epistasis inference approach.

      Although the evolution of broadly neutralizing antibodies (bnAbs) is a motivating question in the introduction and discussion sections (and the title), the relevance of the analysis and results to better understanding how bnAbs arise is not clear. The only result presented in direct connection to bnAbs is Figure 6.

      It is true that, while bnAb development is a major motivator of our study, our analysis focuses on HIV-1 and does not directly consider antibody evolution. We have now brought attention to this point as a limitation directly in the Discussion. Following the suggestion below in the “Recommendations for the authors,” we have edited our manuscript to place more emphasis on viral fitness and somewhat reduce the emphasis on bnAbs, though this remains an important motivating factor. Specifically, the Abstract now begins

      Human immunodeficiency virus (HIV)-1 evolves within individual hosts to escape adaptive immune responses while maintaining its capacity for replication. Coevolution between the HIV-1 and the immune system generates extraordinary viral genetic diversity. In some individuals, this process also results in the development of broadly neutralizing antibodies (bnAbs) that can neutralize many viral variants, a key focus of HIV-1 vaccine design. However, a general understanding of the forces that shape virusimmune coevolution within and across hosts remains incomplete. Here we performed a quantitative study of HIV-1 evolution in humans and rhesus macaques, including individuals who developed bnAbs.

      We have similarly modified the Discussion to focus first on viral fitness. In response to comments from Reviewer 3, we have also more clearly articulated how our work might contribute to the understanding of bnAb development in the Discussion.

      Questions or suggestions for further discussion:

      I list here a number of points for which I believe the paper would benefit if additional discussion/results were included.

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis. In Sohail et al (2022) MBE 39(10), p. msac199  (https://doi.org/10.1093/molbev/msac199) an extension of MPL is developed allowing one to infer epistasis. Can the authors comment on why this was not attempted here?

      I presume one possible reason is that epistasis inference requires considerably more computational effort (and more data). However, since the authors find most beneficial mutations occurring in Env, perhaps restricting the analysis to Env genes only (e.g. the trimer shown in Figure 2) can lead to tractable inference of epistasis within this segment (instead of the full genome).

      As described above, we have now addressed this comment by inferring epistatic fitness landscapes for the data sets that we consider. Our overall results using the epistatic fitness model are consistent with the ones that we previously obtained with an additive model.

      Do the authors find correlations in the inferred selection coefficients of the two samples CH505 and CH848? I could not find any discussion of this in the manuscript. Only correlations between Humans and RM are discussed.

      To address this question, we compared the fitness values and individual selection coefficients across CH505 and CH848 data sets. We found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. We found only 199 common mutations between HIV-1 amino acid sequences from CH505 and CH848 out of 868 and 1,406 total mutations, respectively. Thus, we were not surprised to find no strong relationship between fitness estimates from CH505 and CH848 data sets. 

      Reviewer #2 (Public review):

      Summary:

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 12.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      There is a lot of information to be found in the full fitness landscape of env. The enormous strength of reversion-to-consensus in the patterns is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. This is all of high interest to HIV researchers.

      Strength of evidence:

      One limitation is, of course, that the fitness model is constant in time when the immune challenge is variable and changing. This simplification may complicate some interpretations.

      We agree that this is a limitation of our current approach. In prior work, we have found that the constant fitness effects of mutations that we infer typically reflect the time-averaged fitness effect when the selection changes over time (Gao and Barton, PNAS 2025; Lee et al., Nat Commun 2025). It could be difficult, however, to capture changes in selection that fluctuate rapidly with underlying immune responses. We have added a new paragraph in the Discussion that more clearly sets out some of the limitations of our analysis, including our assumption of constant selection coefficients.

      There are additional methodological and technical limitations that should be considered in the interpretation of our results. Most notably, we assume that the viral fitness landscape is static in time. While we do not expect selection for effective replication (“intrinsic” fitness) to change substantially over time, pressure for immune escape could vary along with the immune responses that drive them. In prior work, we have found that constant selection coefficients typically reflect the average fitness effect of a mutation when its true contribution to fitness is time-varying [42,43]. This may not adequately description mutational effects that undergo large or rapid shifts in time. Future work should also examine temporal patterns in selection for individual mutations.

      Equation 12 in the methods is really a beautiful tool because it is so simple, but accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. However, the reliance on incomplete observations of the frequency leads to complications that must be carefully (re)addressed here.

      For instance, the consistent finding of strong selection in hypervariable regions is biologically intuitive but so striking, that I worry that it might be the result of a bias for selection in high entropy regions. 

      Thank you for this suggestion. We agree that it is important to carefully interrogate these results. To assess the effects of general sequence variability on inferred selection, we first computed a position-specific entropy measure, H<sub >i</sub >, for each site i. We first defined the time-dependent entropy H<sub >i</sub >(t) = - ∑<sub >a</sub> x<sub>i</sub> (a, t) log x<sub>i</sub> (a, t)), where x<sub>i</sub> (a, t) represents the frequency of amino acid/nucleotide a at position i and time t, at each sample time. We then computed H<sub>i</sub> as the average of H<sub>i</sub>(t) across all sample times. A new Supplementary Fig. 1 plots the entropy against the inferred selection coefficients. Although some sequence variation must be observed in order for us to infer that a mutation is beneficial, we did not find a systematic bias toward larger (more beneficial) selection coefficients at more variable sites. Overall, we found only a modest correlation between inferred selection coefficients and entropy (Pearson’s r = 0.33 and 0.29 for CH505 and CH848, respectively), which appears to be partly driven by the tendency for mutations inferred to be significantly deleterious to occur at sites with low entropy. In addition to the new Supplementary Figure, we have added a reference to this analysis in the main text:

      To test whether our results might be biased by overall sequence variability, we examined the relationship between our inferred selection coefficients and entropy, a common measure of sequence variability. Overall, we found only a modest correlation between selection and entropy, suggesting that the signs of selection that we observe are not due to increased sequence variability alone (Supplementary Fig. 1).

      Mutational and covariance terms in equation 12 might be underestimated, due to finite sampling effect in highly diverse populations. Sampling effects lead to zeros in x(t) when actual frequency zeros might be rare at the population sizes of HIV viral loads and mutation rates. Both mutational flux and C underestimation will bias selection upward in eq. 12. 

      The prior papers (1) and (2) seem to show robustness to finite sampling effects, but, again, more care needs to be shown that this robustness transfers to the amino acid inference under these conditions. That synonymous sites are rarely selected for in the nucleotide level is a good sign, and it may be a matter of simply fully explaining the amino-acid level model.

      As above, we agree that these tests are important. To assess the robustness of our results to finite sampling, we performed bootstrap sampling on the viral sequences and inferred selection coefficients using the resampled sequences. Specifically, we resampled the same number of sequences as in the original data at each time point and repeated this for all time points across all HIV-1 and SHIV data sets. A new Supplementary Fig. 11 shows a typical comparison of the original selection coefficients vs. those obtained through bootstrap resampling. Overall, we observe a high degree of consistency between the selection coefficients in each case, which is surely aided by the long time series in these data sets. As pointed out by the reviewer, uncertainty in low-frequency mutations is a particular concern, though the effects on inferred selection are mitigated by regularization. 

      We have added a section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which includes this analysis:

      Finite sampling of sequence data could also affect our analyses. To further test the robustness of our results, we inferred selection coefficients using bootstrap resampling, where we resample sequences from the original ensemble, maintaining the same number of sequences for each time point and subject. The selection coefficients from the bootstrap samples are consistent with the original data (see Supplementary Fig. 11), with Pearson’s r values of around 0.85 for HIV-1 data sets and 0.95 for SHIV data sets, respectively.

      Uncertainty propagates to the later parts of the paper, eg. HIV and SIV shared patterns might be the result of shared biases in the method application. However, this worry does not extend to the apples-to-apples comparison of fitness trajectories across individuals (Figures 5 and 6) which I think are robust (for these sample sizes). 

      One way to address this uncertainty is to compare the fitness values and individual selection coefficients across CH505 and CH848 data sets, which was also requested by Reviewer 1. Overall, we found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. This suggests that similarities between HIV-1 and SHIV landscapes are not solely determined by potential biases in the inference approach. We have now added a reference to this point in the main text:

      In contrast, the inferred fitness landscapes of CH505 and CH848, which share few mutations in common, are poorly correlated (Supplementary Fig. 6). This suggests that the similarities between viral fitness values in humans and RMs are not artifacts of the model, but rather stem from similarities in underlying evolutionary drivers.

      The timing evidence is slightly weakened by the fact that bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Yes, we agree that this is a limitation of our analysis — bNAbs may have been present at low levels before they were detected, and we cannot definitively reject selection by bNAbs. Nonetheless, in at least one case (RM5695), rapid fitness gains were substantially separated in time from bNAb detection (roughly 2 weeks after infection vs. 16 weeks, respectively). We have now added this point in a new paragraph in the Discussion:

      While we found a strong relationship between viral fitness dynamics and the emergence of bnAbs, it may not be true that the former stimulates the latter. For example, bnAbs may have been present within each host before they were experimentally detected. Rapid viral fitness gains within hosts that developed broad antibody responses could then have been driven by undetected bnAb lineages. However, we did not find strong selection for known bnAb resistance mutations, and in at least one case (RM5695), rapid fitness gains (roughly 2 weeks after infection) substantially preceded bnAb detection (16 weeks). Still, given the limited size of the data set that we studied, it is unclear the extent to which our results will transfer to larger and broader data sets.

      Overall thisrpretations could provide valuable insights into the broader significance of these results. is a convincing paper, part of a larger admirable project of accurately inferring complete fitness landscapes.

      Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simianhuman immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance.

      Weaknesses:

      However, the study has some limitations. The most significant weakness is that the authors do not sufficiently discuss the implications of the observed similarities. While the identification of shared evolutionary patterns (e.g., Figure 5) is intriguing, the study would benefit from a more explicit discussion of what these findings mean for instance, in the context of HIV vaccine design, immunotherapy, or fundamental viral-host interactions. Even speculative inte

      Thank you for this suggestion. We have now clarified the potential implications of our work in several areas. While speculative, one possible application is in vaccine design: it may be beneficial to design sequential immunogens to mimic the patterns of viral evolution associated with rapid fitness gains. This “population-based” design principle is different from typical approaches, which have focused on molecular details of virus surface proteins. 

      We have extended our discussion of our results in the context of viral evolution within and across hosts and related host species. Overall, our work suggests that there may be relatively few paths to significantly higher viral fitness in vivo. Evolutionary “contingencies” such as shifting immune pressure or epistatic interactions could influence the direction of evolution, but not so dramatically that the dynamics that we see in different hosts are not comparable. We have also connected our work more broadly to the literature in evolutionary parallelism in HIV-1 in different contexts.

      A secondary, albeit less critical, limitation is the placement of methodological details in the Supplementary Information. While it is understandable that the authors focus on results in the main text - especially since the methodology is not novel and has been previously described in earlier publications - some readers might benefit from a more thorough presentation of the method within the main paper.

      We have now modified the main text to add a new section, “Model overview,” that lays out the key steps of our approach. While we reserve technical details for the Methods, we believe that this new section provides more intuition about how our results were obtained (including a discussion of the important Eq. 12, now Eq. 3 in the main text) and our underlying assumptions.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques. While the quantitative comparison between species is a notable contribution, a deeper discussion of its broader implications would strengthen the paper's impact.

      Reviewer #1 (Recommendations for the authors):

      I suggest de-emphasizing bnAbs and focusing on selection landscape inference, which seems to be the actual focus of the paper.

      While we do not directly study antibody development in this work, bnAb development is certainly an important motivating factor. As described in the responses above, we have now modified the Abstract and Discussion to place relatively more emphasis on fitness comparisons and to relatively less focus on bnAb development.  

      Reviewer #2 (Recommendations for the authors):

      Please make sure that the MPL method is defined in this paper and its limitations are at least partially repeated.

      As noted in responses above, we have now included more methodological details in the main text of the paper, which we hope will make the intuition and assumptions involved in our analysis clearer.

      I'd like the code to better show or describe the model, I could not figure out the model details by looking at the code. It seems mostly just to be csv exporting for use with preexisting MPL code. A longer code readme would be helpful.

      We have now updated the README on GitHub to include a conceptual overview of our inference approach, which references how each step is implemented in the code.

      Reviewer #3 (Recommendations for the authors):

      Try to give some more details (not necessarily giving the full mathematical derivation) on the statistical method utilized.

      As noted above, we have now expanded our discussion of the statistical methods and assumptions in the main text.

      Figures 3 and 4 are somewhat 'messy'. Although I do not have a constructive suggestion here, I feel that with a little more effort maybe the authors could come up with something more clean.

      It is true that the mutation frequency dynamics are somewhat “choppy” and difficult to follow intuitively. To attempt to make these figures easier to parse visually, we have increased the transparency on the lines and added exponential smoothing to the mutation frequencies, resulting in smoother trajectories. The trajectories without smoothing are retained in Supplementary Fig. 3. Here we also note that this smoothing is for visual purposes only; we use the original frequency trajectories for inference, rather than the smoothed ones.

    1. eLife Assessment

      This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.

    2. Reviewer #1 (Public review):

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

    4. Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

    5. Author response:

      Reviewer #1 (Public review)

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      We will consider the inclusion of live imaging experiments using the expressión of C-terminus-tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we will explore how to develop these experiments to generate data for inclusion in a revised submission.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      We don´t discard the presence of “nano beads” in these axons. It was recently suggested that the normal morphology of axons is indeed resembling “pearls-on-a-string” (Griswold et al., 2025), with “nano beads” separated by thin tubular "connectors" (also referred to as NSV, for non-synaptic varicosities). However, it is unlikely that the gap-patch pattern of beta2-spectrin can be attributed to such a morphology, given we used formaldehyde as fixative, and Griswold and colleagues show that the use of aldehyde-based fixatives do not preserve NSVs. We are able to see scattered axonal enlargements (“micro beads”), as we described in distal portions in Fig. 1C(C2) and E. However, the number, appearance and staining of these are not compatible with the gap-patch pattern in beta2-spectrin. Moreover, we would have expected to see these NSVs in our extensive STED imaging, yet we did not. We will discuss this further in the resubmission.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, but we don´t interpret this as “patchy microtubules”. If the Reviewer refers to Fig. 2C-D, it is actually difficult to anticipate the slight decrease in intensity by the naked eye. To further support this, we will consider including stainings and quantitative analyses for microtubules in the resubmission. We are familiar with the use of permeabilizing conditions during fixation (in protocols known as “cytoskeletal fixation” to label microtubules (and not free tubulin).

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low beta2-spectrin and no MPS. Also, the dynamical evolution of the MPS has to take into account beta2-spectrin supply. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of beta2-spectrin . To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into overall MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring new actin filaments. However, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails). Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. It ultimately affects actin filaments as they end up losing monomers, and devoid of new monomers, filaments get shorter and eventually disappear. The drastic decrease in F-actin in our axons reflects that. The fact that F-actin in the MPS is preserved only speaks to the fact that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. We will support this with more observations and imaging and with a more extensive discussion summarizing the literature on the matter in the resubmission.

      On the other hand, the use of F-actin stabilizing drugs (like Jasplakinolide) would have a different effect. We will study how an experiment with these drugs could be informative of the process under investigation for the resubmission

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

      We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. As stated above, we will consider the inclusion of live imaging experiments using the expressión of C-terminus tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we think we can provide the evidence suggested.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      We thank the reviewer for the detailed and accurate description of the data shown.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

      We will consider the inclusion of live imaging experiments using the expressión of tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we believe we can provide the evidence suggested. We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we will continue to work with these cells, but the goal of that project lies well beyond the primary message of the present manuscript, and we anticipate that the revised version will not include new data on this matter. 

      Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      We will further explore the inclusion of more measurements of other parameters and variables towards establishing whether these gaps-and-patches patterns are equivalent structures in control and staurosporine-treated cells. 

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

      As stated before regarding similar comments by other reviewers, we will consider the inclusion of live imaging experiments in the revised version of the manuscript.

      Nicolas Unsain, PhD, and Thomas Durcan, PhD.

      References

      Griswold, J.M., Bonilla-Quintana, M., Pepper, R. et al. Membrane mechanics dictate axonal pearls-on-a-string morphology and function. Nat Neurosci 28, 49–61 (2025). https://doi.org/10.1038/s41593-024-01813-1

      Guix F.X., Marrero Capitán A., Casadomé-Perales A., Palomares-Pérez .I, López Del Castillo I., Miguel V., Goedeke L., Martín M.G., Lamas S., Peinado H., Fernández-Hernando C., Dotti C.G. Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Sci Alliance. 2021 Jun 28;4(8):e202101055. doi: 10.26508/lsa.202101055. Print 2021 Aug.

    1. eLife Assessment

      The effort is timely and the paper carries valuable insights into the function of UTR mutations. There are still significant concerns about both the quality of the screen data, and its ability to detect significant changes in translation and their direction. Therefore, the ability of the screen to support the extensive downstream statistical analysis is limited and leaves the paper incomplete.

    2. Reviewer #1 (Public review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused at identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      • The main issue remains that it appears that the screen has largely failed, and the reasons for that remain unclear, which make it difficult to interpret how useful is the resulting data. The authors mention batch effects as a potential contributor. The authors start with a library that includes ~6,000 variants, which makes it a medium-size MPRA. But then, only 483 pairs of WT/mutated UTRs yield high confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as base-case examples in Fig. 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically-relevant associations.

      • From the variants that had an effect, the authors go on to carry out some protein-level validations, and see some changes, but it is not clear if those changes are in the same direction was observed in the screen. In their rebuttal the authors explain that they largely can not infer directionality of changes form the screen, which further limits its utility.

      • It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      Comments on revisions:

      It appears that the authors have extracted the information they could from the problematic dataset they obtained. Repeating the experiments in a cleaner setting, obtaining data for the >6000 UTRs they intended will allow the authors to achieve the goals they set out to achieve in establishing the screen.

    3. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused at identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      • The main issue remains that it appears that the screen has largely failed, and the reasons for that remain unclear, which make it difficult to interpret how useful is the resulting data. The authors mention batch effects as a potential contributor. The authors start with a library that includes ~6,000 variants, which makes it a medium-size MPRA. But then, only 483 pairs of WT/mutated UTRs yield high confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as base-case examples in Fig. 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically-relevant associations.

      • From the variants that had an effect, the authors go on to carry out some protein-level validations, and see some changes, but it is not clear if those changes are in the same direction was observed in the screen. In their rebuttal the authors explain that they largely can not infer directionality of changes form the screen, which further limits its utility.

      • It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We recognize that RNA distribution within polysomes is inherently less stable than the associated protein components. This instability has been noted in previous studies, including those cited by the reviewer, which used RNA from bulk polysomes to infer the translatome without fractionation. Acknowledging this limitation, we purposely adopted a conservative strategy: (i) performing gross fractionation of polysomes, and (ii) collaborating with biostatisticians at the Institute of Statistical Science, Academia Sinica, to design a conservative yet optimized analysis pipeline that minimized batch effects.

      This approach proved robust: representative cases in Fig. 2B clearly demonstrate distinct distributions of reference and alternative alleles. From our high-confidence dataset, we applied a well-established statistical framework specifically designed to accommodate multiple influencing factors in relatively small datasets (Elements of Statistical Learning by Hastie, Tibshirani, and Friedman). We further conducted sensitivity analyses to select an optimal QC cutoff across a range of stringencies, ensuring maximal reliability of our results. We have therefore successfully shortlisted UTR variants which have strong effect on translation.

      Building upon these conservative measures, we developed a predictive model for translation effects of UTR variants. Importantly, this model was validated not only with our internal test dataset but also with independent external datasets. In addition, the sequence features identified by the model were validated through reporter assays and in vivo CRISPR editing. These external and functional validations establish the generalizability and robustness of our approach.

      A more detailed analysis of the directionality of changes in translation efficiency is under active investigation. These results will be reported in a separate manuscript currently in preparation.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused on identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      (1) The main issue is that it appears that the screen has largely failed, yet the reasons for that are unclear, which makes it difficult to interpret. The authors start with a library that includes approximately 6,000 variants, which makes it a medium-sized MPRA. But then, only 483 pairs of WT/mutated UTRs yield highconfidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as basecase examples in Figure 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically relevant associations.

      To make sure our final results are technically and statistically sound, we applied stringent selection criteria and cutoffs in our analytics workflow. First, from our RNA-seq dataset, we filtered the UTRs with at least 20 reads in a polysome profile across all three repeated experiments. Secondly, in the following main analysis using a negative binomial generalized linear model (GLM), we further excluded the UTRs that displayed batch effect, i.e. their batch-related main effect and interaction are significant. We believe our measure has safeguarded the filtered observations (UTRs) from the (potential) high variation of our massively parallel translation assays and thus gives high confidence to our results.

      Regarding the interpretation of Figure 2B, since we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model, it is statistically conventional to doublecheck the interaction of the two variables using such a graph. For instance, in the top left panel of Figure 2B (5'UTR of ANK2:c.-39G>T), we can see that read counts of WT samples congruously decreased from Mono to Light, whereas the read counts of mutant samples were roughly the same in the two fractions – the trend is different between WT and mutant. Ergo, the distinct distribution patterns of two genotypes across three fractions in Figure 2B offer the readers a convincing visual supplement to our statistics from GLM.

      In contrast to Figure 2B, the graphs of nonsignificant UTRs (shown below) reveal that the trends between the two genotypes are similar across the 'Mono and Light' and 'Light and Heavy' polysome fractions. Importantly, our analysis remains unaffected by differential expression levels between WT and mutant, as it specifically distinguishes polysome profiles with different distributions. This consistent trend further supports the lack of interaction between genotype and polysome fractions for these UTRs.

      Author response image 1.

      Examples of non-significant UTR pairs in massively parallel polysome profiling assays.

      (2) From the variants that had an effect, the authors go on to carry out some protein-level validations and see some changes, but it is not clear if those changes are in the same direction as observed in the screen.

      To infer the directionality of translation efficiency from polysome profiles, a common approach involves pooling polysome fractions and comparing them with free or monosome fractions to identify 'translating' fractions. However, this method has two major potential pitfalls: (i) it sacrifices resolution and does not account for potential bias toward light or heavy polysomes, and (ii) it fails to account for discrepancies between polysome load and actual protein output (as discussed in https://doi.org/10.1016/j.celrep.2024.114098 and https://doi.org/10.1038/s41598-019-47424-w). Therefore, our analysis focused on the changes within polysome profiles themselves. 'Significant' candidates were identified based on a significant interaction between genotype and polysome distribution using a negative binomial generalized linear model, without presupposing the direction of change on protein output. 

      (3) The authors follow up on specific motifs and specific RBPs predicted to bind them, but it is unclear how many of the hits in the screen actually have these motifs, or how significant motifs can arise from such a small sample size.

      We calculated the Δmotif enrichment in significant UTRs versus nonsignificant UTRs using Fisher’s exact test. For example, the enrichment of the Δ‘AGGG’ motif in 3’ UTRs is shown below:

      Author response table 1.

      This test yields a P-value of 0.004167 by Fisher’s exact test. The P-values and Odds ratios of Δmotifs in relation to polysome shifting are included in Supplementary Table S4, and we will update the detailed motif information in the revised Supplementary Table S4.

      (4) It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We understand the concern regarding the relatively small number of translation-shifting variants compared to the large number of features. To address this, we employed LASSO regression, which, according to The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, is particularly suitable for datasets where the number of features 𝑝𝑝 is much larger than the number of samples 𝑁𝑁. LASSO effectively performs feature selection by shrinking less important coefficients to zero, allowing us to build a robust and generalizable model despite the limited number of variants.

      (5) The lack of meaningful validation experiments altering the SNPs in the endogenous loci by genome editing limits the impact of the results.

      Following the reviewer’s suggestion, we assessed the endogenous mutant effect by generating CRISPR knock-in clones carrying the IRF6:c.-4609G>A variant. We showed that this G>A variant generate a deleterious upstream open reading frame, which dramatically reduced protein expression of the main open reading frame (Fig. 7B-D). The genome editing further demonstrated the G>A variant reduced endogenous IRF6 protein expression to 23% or 44% in two independent clones. We have incorporated the genome editing results in the revised  main text and the new Figure 7E&F: 

      “To further validate the endogenous effect of the novel upstream ATG (uATG), we generated CRISPR knockin clones carrying the IRF6:c.-4609G>A variant and examined its impact on gene expression. The introduction of the uATG reduced RNA levels to 88% and 37% of the wild-type in two independent clones (Fig. 7E), and protein levels to 44% and 23%, respectively (Fig. 7F), resulting in an overall reduction of translation efficiency to 50–62%.“ (p.18)

      Reviewer #2 (Public Review):

      Summary:

      In their paper "Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human DiseaseRelevant UTR Mutations" the authors use massively parallel polysome profiling to determine the effects of 5' and 3' UTR SNPs (from dbSNP/ClinVar) on translational output. They show that some UTR SNPs cause a change in the polysome profile with respect to the wild-type and that pathogenic SNPs are enriched in the polysome-shifting group. They validate that some changes in polysome profiles are predictive of differences in translational output using transiently expressed luciferase reporters. Additionally, they identify sequence motifs enriched in the polysome-shifting group. They show that 2 enriched 5' UTR motifs increase the translation of a luciferase reporter in a protein-dependent manner, highlighting the use of their method to identify translational control elements.

      Strengths:

      This is a useful method and approach, as UTR variants have been more difficult to study than coding variants. Additionally, their evidence that pathogenic mutations are more likely to cause changes in polysome association is well supported.

      Weaknesses:

      The authors acknowledge that they "did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency, as the direction of the shift was not readily evident. Additionally, sedimentation in the sucrose gradient may have been partially affected by heavy particles other than ribosomes." However, shifted polysome distribution is used as a category for many downstream analyses. Without further clarity or subdivision, it is very difficult to interpret the results (for example in Figure 5A, is it surprising that the polysome shifting mutants decrease structure? Are the polysome "shifts" towards the untranslated or heavy fractions?)

      Our approach, combining polysome fractionation of the UTR library with negative binomial generalized linear model (GLM) analysis of RNA-seq data, systematically identifies variants that affect translational efficiency. The GLM model is specifically designed to detect UTR pairs with significant interactions between genotype and polysome fractions, relying solely on changes in polysome profiles to identify variants that disrupt translation. Consequently, our analytical method does not determine the direction of translation alteration.

      Following the massively parallel polysome profiling, we sought to understand how these polysome-shifting variants influence the translation process. To do this, we examined their effects on RNA characteristics related to translation, such as RBP binding and RNA structure. In Figure 5A, we observed a notable trend in significant hits within 5’ UTRs—they tend to increase ΔG (weaker folding energy) in response to changes in polysome profiles, regardless of whether protein production increases or decreases (Fig. 3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3A - the claim that 5'UTR variants had a stronger effect than 3'UTR is based on the two UTRs with the strongest effect. It is unclear how these differences between 5' and 3'UTRs are significant.

      We carried out a Wilcoxon rank-sum test to examine the mut/WT fold change of translation efficiency between the 3’ and 5’ UTR variants. The results showed that the 5’ UTR variants exhibited a greater change of translation efficiency. We have inserted this result in the revised Figure 3C and refers to this figure in the main text: “Furthermore, we observed that 5’ UTR variants had a greater impact on translation activity relative to 3’ UTR variants (Fig. 3C).” (p. 12)

      (2) Figures 2B and S1, S2 - what is the meaning of less signal for a light chain and a similar signal for a heavy chain? How can this situation, while being a significant difference between the profiles, lead to a biologically relevant difference in eventual protein output?

      Taking 3’UTR ACADSB:c.*4177G>A (bottom-left panel in Figure 2B) as an example: WT transcripts have less read count (in the unit of log(CPM)) compared with the transcripts carrying the mutant UTR in the light polysome-containing fraction, whereas the read counts of the two genotypes are approximately the same in the heavy polysome-containing fraction.

      In line with our reply to Reviewer 1’s major comment 1, we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model (GLM). That is, the UTR pairs whose WT and mutant have different trends across the fractions (Mono to Light & Light to Heavy) are our targets. In Figure 2B, 3’UTR ACADSB:c.*4177G>A is a perfect example of our significant hits, as it displays the clear distinction of the trends of the two genotypes across three fractions.

      It is widely known that the alteration of polysome profiling distribution indicates the change of translational efficiency. Our GLM model helped us identify the UTR pairs whose WT and mutant have different polysome profiling patterns and thus likely have distinct translational efficiency. Nevertheless, since we only had limited polysome fractions in our experiments, we further validated our significant hits and confirmed the direction of regulation using luciferase reporter assay.

      (3) The paragraph starting with "Even with the high confidence dataset, we did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency" is confusing. The whole premise of the screen used by the authors is that polysome profiling is a useful proxy for estimating levels of translation, so claiming that it doesn't necessarily measure translation is counterintuitive.

      In line with our reply to the last question, our goal is to use the alteration of polysome profiling patterns as a proxy for the change of translational efficiency. However, due to the limited number of fractions in our experiment, we could not directly infer the direction of regulation, i.e. increase or decrease of translational efficiency, of the statistically significant variants. That is why we refrained from making any conclusion about the direction of the regulation for the significant hits and proceed to validate them using luciferase reporter assay.

      (4) Figure S5A - this is normalized to the nucleotide distribution in 5' or 3'UTRs? Is this statistic being applied to 27 SNPs in 3'UTRs?

      To identify sequence features associated with altered polysome association, we systematically analyzed both significant and nonsignificant UTRs for nucleotide and motif-level changes. Fisher’s exact test was employed to evaluate whether specific nucleotide or motif alterations were enriched or depleted in polysome-shifting UTRs, compared to nonsignificant UTR pairs. For example, in the case of nucleotide C (see table below; also Table S4 and new Fig. S6A), only four significant 3’ UTRs involved a change in C, resulting in a significant depletion of this nucleotide change among polysome-shifting 3’ UTRs (odds ratio = 0.22, p = 0.0069). Expanding this approach to all 1-7 nt motifs, we identified multiple motif and nucleotide changes that were significantly associated with altered polysome association.

      Author response table 2.

      (5) "uATG in the 5' UTR was not identified by the model as a widespread feature explaining polysome shifting". Is this because of the method of ribosome profiling or because of the sequences in the library? Can having more sequences in the library specifically looking at 5'UTR give more power for such an effect to emerge?

      Our assay design accounted for the presence of upstream ATG codons and the strength of adjacent Kozak sequences. However, additional factors known to influence the function of upstream open reading frames (uORFs)—such as the reading frame of the uORF relative to the main coding sequence, and the use of nonATG initiation codons—were not systematically included. As a result, the current assay may have limited sensitivity in detecting uORF-related regulatory effects. A dedicated design specifically tailored to uORF variants is likely to enhance the detection power and better capture their contribution to translational control.

      (6) Figure 7B- it is not clear whether the luciferase reporter and the GFP reporter in the library function in a similar manner; is it creating out-of-frame or in of in frame uORF? Also, it is not clear if the differences are statistically significant.

      In the MPRA library, the IRF6 uORF is out of frame relative to the GFP coding sequence. To directly assess its translational impact, we employed a luciferase reporter assay by fusing luciferase downstream of the IRF6 uORF. These constructs revealed a significant reduction in protein production, as shown in Figures 3 and 7B–F. Although the clinically relevant IRF6 uORF is out-of-frame with the main ORF, we engineered an inframe uORF variant to validate translation initiation at the upstream ATG (uATG) (Fig. 7B-D). The in-frame construct confirmed uATG usage and led to a significant reduction in luciferase protein expression. Together, these results support the conclusion that the IRF6:c.-4609G>A variant gives rise to an active uORF that suppresses translation of the main ORF.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be helpful for the authors to subcategorize their data in ways that they consider meaningful and interpretable (e.g. shifts from all monosome to heavy, all heavy to monosome/free, etc.) Relatedly, what do the authors think the functional meaning is when a given transcript has high mono/heavy occupancy but low light occupancy (like what is shown in Figure 2B for ANK2) in the polysome profiling experiment? It is not apparent why a transcript with a high ribosome occupancy (heavy) would also have light occupancy (light).

      From the amplicon sequencing data, we obtained read counts for each UTR variant across the monosome, light, and heavy polysome fractions. Notably, this approach does not preserve the original relative abundance of transcripts among the three fractions. That is, despite a greater abundance of mRNAs in the heavy polysome fraction, comparable numbers of sequencing reads were recovered from the monosome and light fractions. As a result, this method is not suitable for interpreting the global directionality of translational shifts but is well-suited for detecting relative differences in polysome association. Therefore, our experimental and analytical design—combining targeted amplicon sequencing with generalized linear modeling (GLM)—was optimized to identify UTR variants that alter polysome association, independently of absolute transcript abundance in each fraction.

      (2) The method put forward in Figure 2 would be more convincing if there was data showing reproducibility in the massively parallel reporter assay. Perhaps the mut/WT ratio for all transcripts can be plotted against each other and a statistical test of correlation can be performed.

      Thank you for pointing this out. To demonstrate the reproducibility of our massively parallel reporter assay, we have plotted scatter plots of the ratios of all transcripts (summing the monosome, light, and heavy fractions) across different batches using our high-confidence dataset. We calculated the Pearson correlation coefficients and corresponding p-values for these comparisons. The results show strong correlation between each batch, supporting the reproducibility of our assay. We have incorporated this analysis in the main text as well as Supplemental Figure 3: “Pearson correlation analysis revealed R coefficients ranging from 0.59 to 0.71 for the mut-to-WT transcript ratios across three independent experiments (Supplemental Fig. 3).”

      (3) The dots in Figure 2B indicate separate experiments, but the y-axis is log(counts). Values could be normalized (perhaps a ratio of mut/WT) for comparison between experiments.

      We aimed to compare UTR distribution across polysome fractions and recognized the importance of presenting the distribution patterns for both genotypes. This approach allows us to more clearly illustrate the differences or similarities in polysome association between the two genotypes.

      (4) When describing the 5' UTRs used for the validation experiments in Figure 3, more information about the 5' UTR sequence used is necessary. It is not clear how much or what part of the 5' UTRs were removed, or why this was necessary considering the same experiment was conducted using full-length UTRs.

      In the initial library design, technical limitations of bulk oligonucleotide synthesis constrained the UTRs to 155 nucleotides, comprising 115-nt of endogenous human UTR sequence flanked by 20-nt priming sites on both ends. Variants were centered at the 58th nucleotide within the 115-nt UTR sequence. When one flanking region of the native UTR was shorter than 57 nt, the variant was shifted accordingly toward the shorter arm to maintain the 115-nt UTR length (Fig. 2A).

      Given that endogenous UTRs in the human genome are often longer than 155 nt, we further evaluated the functional consequences of variants within full-length UTR sequences (Fig. 3B). While the mutant effects observed in the library setting were largely recapitulated, their magnitude was diminished in the full-length context, likely due to the increased sequence and structural complexity.

      To clarify the experimental design related to Figure 3, we modified the text as the following: “The variants significantly altering the polysome profile were then individually validated by means of high-sensitivity luciferase reporter assays (Fig. 3A). To that end, we resynthesized both the variant and corresponding wildtype alleles in the same library format - 115-nt native UTR segments centered on the variant and flanked by 20-nt priming sites. These UTRs were then cloned upstream (5’) or downstream (3’) of the firefly luciferase coding sequence, depending on their genomic location.” (p. 11)

      (5) The conclusions from inserting RBP-binding motifs into 5' UTRs and assaying translational output (Figure 4) would be strengthened by including luciferase reporters containing endogenous 5' UTRs containing these motifs, and versions where the motifs are disrupted.

      Several variants that altered translation efficiency were validated in their native sequence contexts, including 5’ UTR variants in DMD and NF1 that affect SRSF1/2 binding sites, as well as a 3’ UTR variant in AL049650.1 that impacts a KHSRP binding site (Fig. 3 and Supplemental Figs. S1 & S2). To address the functional relevance of these variants within their native regulatory landscapes, we have incorporated the following clarification into the text (p. 13): “This observation is consistent with additional findings where variants that create or disrupt specific RBP binding sites—such as SRSF1/2 (e.g., in DMD and NF1; Fig. 2 and Supplementary Fig. S4) and KHSRP (e.g., in AL049650.1; Fig. 2 and Supplementary Figs. S4 & S5)—led to significant changes in translation efficiency within their native UTR contexts.”

      (6) Figure 5C shows that 5' UTR SNPs that form an uAUG are associated with greater structural changes, but this does not "indicate" that "structure‐modifying UTR variants may control primary ORF translation partly by interfering with translation initiation from a uORF." The data presented in Figure 5 and luciferase/polysome data presented previously do not distinguish whether translation is occurring at an uAUG or canonical AUG. The statement quoted above is speculative and it should be clear that it is a hypothesis generated by the data and is not conclusive.

      We appreciate the reviewer’s suggestion. We have therefore modified our text to: ”Therefore, while changes in uATG may not be common explanatory factors for polysome-shifting mutations, our results suggest that structure-modifying UTR variants may control primary ORF translation partly by interfering with translation initiation from a uORF.” (p. 14)

      Minor points/questions

      (1) The authors should clarify whether during library construction for massively parallel polysome profiling the 3' UTR constructs contain a common 5' UTR? Likewise, do the 5' UTR constructs contain a common 3' UTR? Perhaps the lack of a 5' UTR in the 3' UTR constructs, which is implied by Figure 2A, would influence differences seen between 3' UTR pairs (and likewise for 5' UTR pairs).

      There are short common 5’ UTRs appended to the 3’ UTR library, and likewise, a common short 3’ UTR is included in the 5’ UTR library. The common 5’ UTR comprises partial sequences from the CMV promoter and the plasmid backbone of pEGFP-N1 vector. The common 3’ UTR includes sequences from the pEGFP-N1 backbone and a short polyadenylation signal from HBA1 (hemoglobin subunit alpha 1). While we cannot entirely rule out potential crosstalk between 5’ and 3’ UTRs, the design ensures that all constructs are compared in a controlled and consistent context, enabling valid pairwise comparisons between variant and wildtype alleles.

      To clarify the library design, we have revised the main text to include this explanation: 

      “The entire library of UTR oligonucleotides (UTR library) was subsequently ligated upstream or downstream of an enhanced GFP (EGFP) coding region, along with a CMV promoter and a common UTR sequence on the opposite end. Cells transfected with the UTR library were treated with cycloheximide 14 hours post transfection and then subjected to polysome fractionation (see Methods).” (p.11) 

      “The variants significantly altering the polysome profile were then individually validated through highsensitivity luciferase reporter assays (Fig. 3A). To this end, we resynthesized both the variant and corresponding wildtype alleles in the same library format - 115-nt native UTR segments centered on the variant and flanked by 20-nt priming sites. These UTRs were then cloned upstream (5’) or downstream (3’) of the firefly luciferase coding sequence, depending on their genomic location. As the initial library design, the test UTR segment differs only by one nucleotide, while a shared short UTR fragment is present on the opposite end of the coding sequence to ensure consistency across constructs (Fig. 2A).” (p. 12)

      (2) The lines connecting the polysome distribution points make the plots appear busy and difficult to read, the data would be easier to interpret if they were removed.

      We employed a generalized linear model (GLM) to identify the variants that altered the polysome association of the corresponding transcripts. Statistically speaking, we were looking for the variants which led to significant interaction between genotype and polysome fractions. Ergo, displaying the lines as it is in our plots offers readers a convincing visualization of the interaction: lines from WT and Mut groups were not parallel, which indicates the interaction between genotype and polysome fractions. Moreover, showing the lines from three batches of experiments also helps us ascertain the reproducibility of our experiments. Taken all together, the presence of the lines makes our plots even more informative.

    1. eLife Assessment

      In their study, Cummings et al. provide a valuable advance in understanding the hierarchical regulation of tubulin polyglycylation, demonstrating that TTLL8 initiates monoglycylation which is a prerequisite for TTLL10-mediated polyglycylation. The evidence supporting these mechanistic insights is solid, relying on a compelling combination of purified biochemical assays, mass spectrometry, and microscopy. The work is further valued for revealing an unexpected crosstalk between polyglycylation and polyglutamylation that ensures a balanced post-translational modification landscape for proper cilia function.

    2. Reviewer #1 (Public review):

      Summary:

      In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that the monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.

      Strengths:

      The manuscript is well written, and experiments are succinctly planned and outlined. The experiments used provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.

      Weaknesses:

      There were some weaknesses in the initial submission of the manuscript, but the authors have addressed these in their revised version either by giving clear explanations in the text or through additional experiments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS, and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how the level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.”

      Strengths: 

      The manuscript is well-written, and experiments are succinctly planned and outlined. The experiments were used to provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.”

      We thank the reviewer for their support of our study and recognition of its importance to understanding microtubule glycylation and its regulation.  

      “The initial part of the manuscript where the authors discuss about the requirement of monoglycylation by TTLL8 is not new. This was established back in 2009 when Rogowski et al (2009) showed that polyglycylation of tubulin by TTLL10 occurs only when co-expressed in cells with TTLL3 or TTLL8. So, this part of the study adds very little new information to what was known. “

      Our study provides the first in vitro evidence with purified recombinant components that human TTLL8 is exclusively a monoglycylase (Figure 1) and that polyglycylation by TTLL10 requires previous priming with monoglycylation (Figure 2). Studies with purified recombinant components are the gold standard for establishing the activity of an enzyme as cellular work can be obfuscated by the activity of other regulators. We did cite in our original submission the work by Rogowski, Gaertig and Janke from 2009 (reference 15 in the original submission) as well as that Ikegami and Setou 2009 work (reference 26 in the original submission) that established that TTLL10 polygyclylase activity requires co-expression with TTLL8 in cells. Specifically, we stated in our original submission and in the revised manuscript:

      “Cellular overexpression studies coupled with the use of antibodies that recognize mono- and polyglycylation indicate that TTLL8 is also a glycyl-initiase, while TTLL10 a glycyl-elongase (15, 26).  However, direct biochemical evidence with purified enzymes for segregated initiation and elongation activity for glyclases is still lacking as does knowledge of their substrate specificity and regulation.” 

      In addition to citing the Setou study, we now cite again the Rogowski, Gaertig and Janke 2009 study later in the manuscript when the cellular data are mentioned again.  Specifically, we state in the revised manuscript: 

      “This is consistent with cellular overexpression data which showed that polyglycylation signal was detected via antibody only in tubulin from cells that co-expressed TTLL8 and TTLL10, but not TTLL10 alone (15, 26).”

      “The study also fails to discuss the involvement of the other monoglycylase, TTLL3 in the entire study, which is a weakness as in vivo, in cells, both the monoglycylases act in concert and so, may play a role in regulating the activity of TTLL10. “

      We previously showed that purified recombinant TTLL3, like TTLL8, adds only monoglycines, with a preference for the b-tubulin tail (Garnham et al., PNAS 2017). Given that TTLL10 requires priming by monoglycylation, we expect that, similarly to TTLL8, TTLL3 will allow elongation of the initial monoglycyline chains by TTLL10. 

      (1) From the mass spec data, it appears that the Xaenopus Laevis TTLL10 can add up to 18 residues. However, the numbers indicated in Figure 2E seem to suggest that it is a maximum of 23 residues only at a particular position. Does this mean that the 13-18 residues observed are a collection of multiple short-chain polyglycylations or are there positions that the authors observed where there were chains of longer than 3 glycine residues? This would be an interesting point to note as when it was discovered in Paramecium, the polyglycyl chains were reported to be up to 34 residues (Redeker et al., Science 1994). If the authors could test the TTLL10 from Paramecium to observe if this is a consistent phenomenon across evolution or is there a biologically significant difference that is being developed, would be interesting to know.”

      Figure 2E shows a subset of the modified tails that we identified and where the position of the posttranslationally added glycine can be mapped to a specific position, or range of positions. Additional species exist. We note that the mass spectra in Figure 2B are intact LC/MS, while those in Figure 2E are MS/MS. The ionization of tubulin tail peptides with larger number of glycines is not as efficient as for shorter glycine chains, reducing the sensitivity of detection of species that have higher number of glycines. This is not as pronounced when the mass spectra are obtained from the intact protein (Figure 2B). In summary, our data supports the fact that TTLL10 elongates polyglycine chains at multiple positions in the tubulin tail (shown in Figure 2E), however, we cannot ascertain the maximum polyglycine chain length, only the total number of glycyines added.

      Testing the enzyme from Paramecium is an interesting proposal but outside the scope of this manuscript. 

      (2) While it is interesting to know that the TTLL10 binds to TTLL8-modified tubulin with a much higher affinity than unmodified tubulin, in vivo, the microtubules will be a mixture of both TTLL3- and TTLL8-modified tubulin. It would be good to see the binding of the enzyme to a tubulin that is modified by both TTLL3 and TTLL8 if the two have a greater influence on TTLL10 binding.”

      Our previous work showed that purified recombinant TTLL3 has purely monoglycylase activity, with a preference for b-tubulin (Garnham et al., PNAS 2017). The sites of monoglycylation by TTLL3 overlap with those introduced by TTLL8 on b-tubulin (the difference being mainly that TTLL3 is more selective towards b-tubulin and thus has lower activity on a-tubulin). TTLL8 introduces additional monoGlys on the a-tubulin tail. Therefore, it is unlikely that TTLL10 will have a different response to microtubules that carry similar numbers of Gly residues, regardless of whether introduced by TTLL8 or TTLL3 and 8. Our data show that TTLL10 binding increases with Gly number, but that the gains in affinity plateau as the density of glycine residues on the tails increases above a certain threshold, likely because one TTLL10 molecule recognizes one monoGly branch, and steric hindrance on the tubulin tail prevents further recruitment of additional TTLL10 molecules.  

      (3) The authors have always increased the number of monoglycines in beta-tubulin more than in alpha-tubulin. Is there a rationale for this? Since TTLL8 is known to predominantly modify alphatubulin (Rogowski et al., 2009; Gadadhar et al., 2017) why did the authors not check for the increased binding of the TTLL10 on dimers where the number of monoglycines on alpha-tubulin is higher than 1.1? Especially when they themselves observe in their mass spec that even on alphatubulin there are 1, 2, and 3 glycines added. I would like to see what happens if the ratio is high alpha-G + low beta-G”

      As our spectra in Figure 1 show, we find that TTLL8 is able to modify robustly in vitro both a- and b-tubulin but that it shows a slight preference for b-tubulin (Figure 1B). The work from the Janke group that the reviewer is referring to (Rogowski et al., 2009 and Gadahar et al., 2017) did not use recombinant, purified enzymes and unmodified microtubules as substrates and used axonemal tubulin (which carries many modifications), and so it is possible that the a-tubulin preference observed in that system when TTLL8 is overexpressed, is likely to other factors that do not reflect the biochemical property of the enzyme alone (for example, it could be because btubulin site are not available because they are already glutamylated). As can be seen from Figure 3D, the gain in affinity when increasing the number of glycines from one glycine is small, compared to the initial monoglycine added to the a- and the b-tubulin tail, likely reflecting that one tail cannot bind more than one TTLL10 at one time because of steric hindrance. Moreover, it is important here to note that glutamylation and glycylases compete for the same sites on the tubulin tails, as we have for example shown for TTLL3 and TTLL7 (Garnham et al., 2017), therefore the activity of these enzymes in vivo or with non-naïve substrates are context dependent and influences also what sites are available for TTLL10 to modify. In conclusion, by using recombinant enzymes and naïve tubulin we gain insight into the intrinsic property of these enzymes and therefore provide a framework for the interpretation of in vitro and in vivo observations. 

      (4) I wonder why the authors did not use the human TTLL10 to test if this also shows similar binding to the glycylated tubulin despite the fact that it is enzymatically inactive. If it does, then it would be interesting to see the kinetics of binding of this enzyme to see if the fall off of the enzyme from the tubulin is solely driven by the level of polyglycylation only, or if it has any other mechanism involved as well.”

      Work with human recombinant TTLL10, a TTLL10 homolog which was proposed to be inactive, will be an interesting future direction but outside the scope of this manuscript. We did note in our previous manuscript (Garnham et al., 2017, Figure S5) that the residues which are mutated in the human enzyme compared to other mammals are on the dorsal face of the enzyme, far away from the active site, raising an interesting question of how they inactivate the enzyme.   We need however to emphasize that our work clearly shows that it is polyglycylation on the microtubules that reduces binding of TTL10 to microtubules because experiments done in the absence of glycylating activity i.e. with enzyme that was incubated with microtubules that were pre-modified with polyglycline chains, but in the absence of glycyine substrate (precluding any glycylation activity during the binding assay) show that the binding decreases monotonically with the number of polyglycines  on the microtubule (Figures 4A, B).  

      (5) In Figure 5, the authors use monoglycylated tubulin that is either glutamylated or not to show that the activity of TTLL10 is enhanced by the extent of polyglutamylation present on the tubulin. However, there is no evidence of the enzyme binding to microtubules that are only glutamylated. It would be good to test this to determine if the binding is also dependent on both monoglycylation and glutamylation or is it only the enzyme activity.

      Figure 5E shows that TTLL10 binding increases with monoglycylation alone, and that glutamylation is additive and Figures 4A, B show that it is not the enzyme activity that affects the binding, but the glycylation state of the microtubule. We did not determine binding to microtubules that were only glutamylated, because TTLL10 would not be able to elongate polyglycine chains on those microtubules, even if it bound. 

      (6) The level of polyglycylation used in Figure 5 is quite low. It would be good to see how the length of the polyglycine chain impacts TTLL10 activity in the presence of polyglutamylation, and whether this has any cooperative effect leading to longer chain polyglycylation than what is seen with only monoglycylated tubulin.

      We expect longer chain polyglycylation to have an inhibitory effect as we show in Figure 4. 

      “(7) In the overall study, the authors fail to discuss whether the activity of both the glycylases at different sites on tubulin is sequential, or modifications at different residues happen all at once. If the authors were to do a sequential time course of the modification followed by MS/MS analysis, they could get some indications about this.”

      As the data in Figure 3D shows, the effect of adding more monoGly site on a tubulin tail has a muted effect on binding, indicating that the additional mono-Gly branches do not lead to more TTLL10 recruitment because of steric hindrance i.e. multiple TTLL10 enzymes cannot be accommodated on the same tail at the same time efficiently. This is consistent with the overall dimensions of the enzyme and the positions of its active site, which were modeled initially in our previous publication (Garnham et al., PNAS 2017).  The site of TTL10 action is pre-determined by the position of the mono-Gly branch introduced by TTLL3 or TTLL8. The length of the tubulin tail and the proximity of mono-Gly sites to each other precludes TTLL10 acting at multiple positions at once on the same tail.

      “(8) Do the modifications have any cooperative effect with respect to the sites of modification? Does modifying a particular site enhance the kinetics of modification of the other sites? Can the authors test this?”

      This would be an interesting line of future investigations.  

      “Minor points:

      (1’) The authors opine that the level of polyglycylation is regulated by the decreased binding of the TTLL10 to the polyglycylated tubulin. While this is an interesting argument, which could be a possibility based on the data they present, it would still not answer if this is a mechanism followed by TTLL10 of all species or not. If they could test the efficacy of TTLL10 from another species, to see the binding efficiency of that enzyme, it could potentially strengthen their argument of this possible mechanism.”

      The differences between the properties of TTLL10 from different organisms will be an interesting focus of future investigations, but outside the scope of this present study. However, we would like to point out that the level of sequence conservation between TTLL10 makes it unlikely that other TTLL10 do not follow a similar mechanism, albeit with possible differences in the extent of the response.  We also note that we have shown that polyglycylation also inhibits binding to the microtubule of the severing enzyme katanin (Szczesna et al., Dev. Cell 2022). Therefore, these studies suggests that polyglycylation might be a more general mechanism for reducing microtubule binding affinity since glycylation reduces the negative charge on the tubulin tails, which frequently interact with positively charged domains or interfaces in microtubule associated proteins.  

      “(2) The authors indicate that glycylases act on pre-glutamylated microtubules. However, in their assays, they use unmodified tubulin, which I would presume is also not glutamylated. If this is the case, how can they justify that the enzymes prefer pre-glutamylated microtubules? This is a bit unclear. Do they mean that their tubulin is already pre-glutamylated? Have they tested this?”

      The statement regarding the action of these enzymes on glutamylated microtubules refer to the in vivo situation where polyglycylated microtubules appear in cilia biogenesis after the microtubules in the axoneme are already glutamylated. In vitro, by using microtubules that are only monoglycylated and microtubules that are both glutamylated and monoglycylated, we show that glutamylation further increases recruitment of TTLL10 to microtubules that are monoglycyated. Therefore, glutamylated microtubules will be polyglycylated preferentially over those that are not glutamylated. 

      We state: “Axonemal microtubules are abundantly glutamylated. Glutamylation appears during cilia development first, followed by glycylation (12, 13), indicating that in this scenario glycylases act on pre-glutamylated microtubule substrates.”

      “(3) In continuation with the previous point, an immunoblot of their purified tubulin showing no reactivity to anti-glycylation or anti-glutamylation antibodies, which upon treatment with TTLL8 reacts to the anti-glycylation antibody would be confirmatory evidence to show that the isolated tubulin was indeed unmodified.”

      We have now included a Western blot of our TOG-purified tubulin as Figure S3 in our revised manuscript.  This shows a faint signal with the pep-G1 antibody and a very strong signal after TTLL8 treatment. We are not sure whether the low signal with the pep-G1 antibody for the unmodified tubulin is due to low bona fide monoglycylation-specific signal or a low affinity nonspecific interaction of this antibody (raised against mono-glycylated tubulin tail peptides) with the unmodified tubulin. We note that this signal is clearly visible only when loading at least 0.2 micrograms of the purified tubulin. At this loading level the signal for the glycylated species is saturated. It is also important to note that we have not detected glycylated species in this tubulin either by LC-MS or MS/MS. Therefore, our data strongly indicate that the tubulin purified from tsA201 cells is not glycylated or has at most extremely low levels of glycylation. Importantly, this potential trace level of monoglycylated tubulin does not affect any of the conclusions in this study. The Western blot also shows no detectable signal with the polyglycyation antibody in the unmodified tubulin and a very strong, saturated signal after the tubulin was treated with both TTLL8 and TTLL10.  We also added an additional Figure S8 that shows that the tSA201 tubulin does not give a detectable signal for glutamylation. Please see also Figure 3 from Vemu et al., Methods Enzymology 2017 where we also published a Western blot from our TOG-purified tubulin using anti-glutamylation antibodies. 

      “(4) In their study, the authors have used polyglycylation of up to 10-13 residues. This brings me to my first point that in the case of Paramecium, the number was identified to be up to 34, which would mean that this enzyme has higher binding or catalytic activity. I would like to know the authors' perspective on this, as to what could potentially determine the difference in the activities of TTLL10 across species.”

      The Xenopus TTLL10 enzyme can add more glycines than the 10-13 range that we show here if the enzyme is incubated for longer periods. The fact that glycine numbers as high as 34 were detected in Paramecium does not necessarily mean that the Paramecium enzyme is more active since there is no equivalent data to compare it with from Xenopus. The only way to address potential species differences in enzyme specific activity is to purify enzymes from different species and compare their activity side-by-side.  

      (5) How was the completion of the reaction of monoglycylation and polyglycylation determined? If the enzymes were left for more than 20 minutes, did TTLL8/ TTLL10 add more glycines? What is the reason for using less tubulin (1:20 enzyme:tubulin molar ratio) for monoglycylation by TTLL8, and more tubulin (1:50 enzyme:tubulin molar ratio) for polyglycylation by TTLL10?

      Yes, if the enzymes were incubated longer, they added more glycines. The extent of glycylation was determined from the LC-MS and the incubation time was varied to obtain samples with fewer or more glycines.   The lower ratio used for TTLL10 is because of the higher specific activity of that enzyme compared to TTLL8.  

      (6) Figure S2 A, b2 ion is not indicated in the peptide sequence, while it is shown in the m/z graph.

      We thank the reviewer for the careful reading. We have corrected this in our MS/MS spectrum. 

      Reviewer #2 (Public review):

      “In their manuscript, Cummings et al. focus on the enzymatic activities of TTLL3, TTLL8, and TTLL10, which catalyze the glycylation of tubulin, a crucial posttranslational modification for cilia maintenance and motility. The experiments are beautifully performed, with meticulous attention to detail and the inclusion of appropriate controls, ensuring the reliability of the findings. The authors utilized in vitro reconstitution to demonstrate that TTLL8 functions exclusively as a glycyl initiase, adding monoglycines at multiple positions on both α- and β-tubulin tails. In contrast, TTLL10 acts solely as a tubulin glycyl elongase, extending existing glycine chains. A notable finding is the differential substrate recognition between TTLL glycylases and TTLL glutamylases, highlighting a broader substrate promiscuity in glycylases compared to the more selective glutamylases. This observation aligns with the greater diversification observed among glutamylases. The study reveals a hierarchical mechanism of enzyme recruitment to microtubules, where TTLL10 binding necessitates prior monoglycylation by TTLL8. This binding is progressively inhibited by increasing polyglycine chain length, suggesting a self-regulatory mechanism for polyglycine chain length control. Furthermore, TTLL10 recruitment is enhanced by TTLL6mediated polyglutamylation, illustrating a complex interplay between different tubulin modifications. In addition, they uncover that polyglutamylation stimulates TTLL10 recruitment without necessarily increasing glycylation on the same tubulin dimer, due to the potential for TTLLs to interact with neighboring tubulin dimers. This mechanism could lead to an enrichment of glycylation on the same microtubule, contributing to the complexity of the tubulin code. The article also addresses a significant challenge in the field: the difficulty of generating microtubules with controlled posttranslational modifications for in vitro studies. By identifying the specific modification sites and the interplay between TTLL activities, the authors provide a valuable tool for creating differentially glycylated microtubules. This advancement will facilitate further studies on the effects of glycylation on microtubule-associated proteins and the broader implications of the tubulin code. In summary, this study substantially contributes to our knowledge of posttranslational enzymes and their regulation, offering new insights into the biochemical mechanisms underlying microtubule modifications. The rigorous experimental approach and the novel findings presented make this a pivotal addition to the field of cellular and molecular biology.”

      We thank the reviewer for their support of our work.

    1. eLife Assessment

      This study provides convincing evidence of coordinated spiking activity of neurons in the anterior cingulate cortex (ACC), and correlated activity in the CA1 subregion of the hippocampus, during observational learning. The authors also show coordinated ACC-CA1 neural activity during rest periods prior to the performance of the observationally learned task. The important findings significantly advance the field's understanding of neural mechanisms underlying social learning.

    2. Reviewer #1 (Public review):

      Summary:

      Mou and Ji investigate the relationship between firing rates in the anterior cingulate cortex (ACC) and CA1 neurons during observational learning. They found trajectory-selective responses in the ACC, coordinated activity between ACC and CA1 place cells for specific trajectories, and reactivation of these ensembles during sharp-wave ripples (SWRs), particularly during hippocampal replay events. The study is methodologically sound, the data are clearly presented, and the conclusions are well supported. The work is both novel and highly relevant to our understanding of social learning. Compared to the previous version of the paper, they have added substantial characterization of neuronal properties related to their firing during the task and replay events. I believe that the authors have therefore addressed most of my concerns and recommend the paper for publication as is.

      Strengths:

      The study is well designed, the data presented is very clear and the conclusions are appropriate regarding their results. The study is novel and of high relevance for the understanding of social learning.

      Weaknesses:

      All previous weaknesses have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, Xiang Mou and Daoyun JI investigate how ACC neurons activated by observational learning communicate with the hippocampus. They assess this line of communication through a complex behavioral technique, in vivo electrophysiology, pharmacological approaches, and data analytical techniques. Firstly, authors find that observational performance is dependent on the ACC, and that the ACC possess neurons that show side selectivity (trajectory related) in both the observation box, when shuttling to reward, and during subsequent maze running, shuttling to the corresponding same side for reward. The side-selective activation appears stronger for correct trials compared to error trials specifically during observation of Demo rats. They compare how the CA1 of the hippocampus encodes these two environments and find that ACC side-selective neurons show correlation with side-selective CA1 ensembles during maze behavior, water consumption, and sharp-wave ripples.

      Strengths:

      Overall, the paper provides strong evidence that ACC neurons are activated by observational learning and that this activation seems to be correlated with CA1 activity.

      Weaknesses:

      Concerns, however, surround the strength of evidence that links ACC and CA1 activity during observational learning. Only weak correlations between the two regions are shown, and it is unclear if the ACC may lead CA1 activity or vice versa. It is possible that these processes reflect two parallel pathways. Without manipulation of ACC, it is difficult to assess whether ACC activity influences hippocampal replay.

      Comments on revisions:

      Lines 361-362: R and P values do not match that of Figure 5C.

    4. Reviewer #3 (Public review):

      Summary:

      Mou and Ji investigated neuro-computational mechanisms behind observational spatial learning in rats and reported several signs of functional coupling between the ACC and CA1 at the single neuron level. Using multi-site tetrode recording, they found that ACC cells encoding a path in a maze were activated while a rat observed another rat taking that path. This activation was also correlated with the activation of CA1 cells encoding the same path and facilitated their replay during sharp-wave ripples (SWRs) before the recording rat ran on the maze by itself. These activity patterns were associated with correct path choice during self-running and were absent in control conditions where the recording rat did not learn the correct choice through observations. Based on these findings, the authors argue that ACC cells capture the critical information during observation to organize hippocampal cell activity for subsequent spatial decisions.

      Strengths:

      The authors used multiple outcome measures to build a strong case for path-specific spike coordination between ACC and CA1 cells. The analyses were conducted carefully, and proper control measures were used to establish the statistical significance of the detected effects. The authors also demonstrated tight correlations between the spike coordination patterns and the successful use of observed information for future decisions.

      Weaknesses:

      (1) As evidence for the activation of path information in the ACC during observation, the authors showed positive correlations between firing rates during observation and those during self-running. This argument will be solidified if the authors use a decoding approach to demonstrate the activation of path-selective ACC ensemble activity patterns during observation. This approach will also open the door to uncovering how the activation of ACC path representation is related to the moment-to-moment position of the demonstrator rat and whether it is coupled with the timing of SWRs.

      (2) The authors argued that the ACC biases the content of awake replay in CA1 during SWRs in the observation period. The reviewer wonders if a similar bias also occurs during SWRs in the self-run period (i.e., water consumption after the correct choice). This analysis will help test whether the biased replay occurs due to the need to convert observed information into future choices.

      (3) Although the authors demonstrated the necessity of the ACC for the task, it remains to be determined whether firing coordination between the ACC and CA1 during observation is necessary for the correct path choice during self-runs. Some discussion on this point, along with future direction, would be beneficial for readers.

      Comments on revisions:

      The authors fully addressed my recommendations. I do not have any further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      (1) In Figure 2, only the right or left selective neurons are presented for the comparison, it would be helpful to also compare these with the neurons that are not selective for any of the sides and maybe include them in the supplemental materials

      We have included all non-selective neurons in Figure 2D and supplemental Figure 2B. Their differences in firing rate between left and right sides are quantified by their selective indices (SIs). 

      (2) The authors should provide controls of speed during NMDA infusion and vehicle.

      We have quantified and compared the duration of running laps, which is equivalent to speed.

      (3) In Figure 1d, the trend shows that even during NMDA infusion, the animals learn as shown by a higher proportion of correct trials in the 3rd compared to the 1st trial

      We thank the reviewer for pointing that out. We noticed that NMDAlesioned ACC animal showed a trend of improved performance in the track, and we believe this is due to re-learning of the task, which we point out in the main text. However, we emphasize that, compared to the Vehicle control, the overall performance of NMDA-lesioned animals was significantly impaired.

      (4) Clarify the implications of the NMDA experiments, as it is not straightforward to interpret that an interplay between ACC-CA1 is involved in this task as per this experiment.

      Rather than stating the involvement of ACC-CA1 interplay, we use the results of NMDA lesion experiment to demonstrate that ACC is also required, besides CA1, for the task.

      (5) In Figure 4b, there seems to be a lag between CA1 and ACC correlations; the authors could provide a quantification of this temporal delay between CA1 and ACC.

      Figure 4B shows the cross-correlation between one example ACC cell and its associated CA1 ensembles on the left and opposite sides. There was a broad peak around time lag 0. Our further investigation did not identify a significant, systemic delay for all ACC cells, which led us to quantify the correlation at time lag 0 in Figure 4C and D.

      (6) The example correlation provided in 5c for the opposite, doesn't seem representative of the population trend as shown in 5d, since both the Same and the Opposite for the demo show a positive trend. It would be best to choose an example that represents the population better.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 5C.

      (7) Almost the same can be applied to Figure 6.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 6E.

      (8) The results in Figure 7 are convincing, in my opinion, as they show that the trend is lost for the opposite side (contrary to the coactivation shown in Figures 5 and 6 that showed the same trends for the same and opposite during Demo). Do the authors have any interpretation of this? Is it due to co-activity reflecting other task-relevant features different than the spatial trajectory being observed?

      The correlation on the opposite side between CA1 and ACC shown in Figure 5C-D and Figure 6E-F is likely due to a general interaction between CA1 activities around SWRs with prefrontal cortical areas including ACC, as shown in previous studies (Jadhav et al., 2016; Remondes and Wilson, 2015).  We would like to point out that this correlation only quantifies the coactivation between CA1 ensemble firing rates and individual ACC cells’ firing rate. This raw correlation does not consider the content of spikes generated by CA1 ensembles, neglecting the sequential firing patterns of CA1 cells. The replay analysis in Fig. 7 examines the order of spikes generated by individual CA1 cells. The result in Fig. 7 shows that the sequential activation of CA1 place cells more accurately reflects the distinction between the same- and opposite-side trajectories. We consider Fig. 7 is more refined analysis than Figs. 5 and 6.

      (9) For all the figures regarding SWR activities, the authors should provide average PSTH for CA1 as well as ACC, perhaps also examples of neurons that are selectively active during one side or the opposite side runs.

      Following the reviewer’s suggestion, we have added data to show PSTH for CA1 and ACC cells surrounding SWR peaks (Figure S5E, F). 

      Reviewer #2 (Recommendations For The Authors):

      Below are additional notes for improvements.

      (1) Figure 1C. Unclear what Time 0 indicates.

      We specify it (OB's poke time) in the figure legend. 

      (2) Figure 2C. Unclear what the numbers above datapoints mean.

      Those numbers are selection indices (SIs), as specified in the legend. 

      (3) Figure 5: Line 374-375. Given the repetitive nature of the task, it is unclear whether SWRs are encoding upcoming or past spatial trajectories or whether they are encoding trajectories at all. The authors would need to show that SWRs-ACC communication is predictive of task outcome to claim it is specifically necessary for future outcomes rather than consolidating past trajectories.

      We agree with the reviewer and have made changes to reflect that the ACC-CA1 correlation in Fig.5 is specific to the same side of their selectivity, not exactly to future trajectories. Regarding the repetitive nature of the task (same-side rule), we have specifically addressed the advantage and limitation of this task design in the discussion. Regarding the observer's own past vs. future trajectories, our past publication (Mou et al., 2022) shows that the CA1 replay in SWRs more likely encode the correct, future trajectories. 

      (4) Figure 7. It appears that the correlation was conducted between ACC activity and CA1 replays recorded at distinct time windows (delay period vs. water consumption). It is unclear how ACC activity could influence CA1 replays when they occur hundreds of milliseconds apart or even longer.

      We thank the reviewer for raising this important question. We have shown that the higher same-side ACC activity during observation continues during water consumption. However, our added data in Fig.S5E show that this enhancement did not occur precisely within SWRs. We thus propose a possibility that the overall enhanced activity of same-side ACC cells during water consumption provides an overall, background excitation boost to same-side CA1 cells to enhance their replay within SWRs. We have revised the discussion section to present this model. 

      (5) Abstract: lines 24-25 Discussion: lines 475-476 Based on the data there is no certainty whether ACC biases or coordinates CA1 replays. The data simply shows that they are correlated with one another.

      We have modified those sentences to clarify the non-causal nature of the interaction.

      Reviewer #3 (Recommendations For The Authors):

      Please see below for the list of minor corrections and suggestions:

      (1) Line 136-143: On the data shown in Figure 1D, I recommend using two-way mixed ANOVA with sessions as a within-subjects factor and groups as a between-subjects factor.

      We thank the reviewer for this point. We indeed use two-way ANOVA for those comparisons. We have specified out in the text.

      (2) Line 219-228: I recommend expanding the explanation of two control conditions here. It was written in the method section, but the readers would appreciate the gist of these conditions in the result section. In particular, it was unclear how box SI was calculated in the Empty condition. Also, the plots of poke rates in the control conditions will be useful to show that rats did not learn the correct choice from observation in these control conditions.

      We have added more explanation of the two control conditions in the text. The quantifications of poke rates for Demo and two control conditions (Object, Empty) are provided in our previous publication (Mou et al., 2022).

      (3) Line 610: Please specify the number of three types of sessions each rat underwent and the order of these session types.

      We revise the texts in the Method section and provide the numbers.

      (4) In Figure 2c legend, please specify what the number (e.g., -0.41) indicates.

      Those numbers are selection indices (SIs), as specified in the legend.

    1. eLife Assessment

      This valuable study introduces a data augmentation approach based on generative unsupervised models to address data imbalance in immune receptor modeling. Support for the findings is solid, showing that the use of generated data increases the performance of downstream supervised prediction tasks, e.g., TCR-peptide interaction prediction. However, the validation, mainly relying on synthetic data, could be completed, especially regarding unseen epitopes, and given the exclusive focus on CDR3β. The results should be of interest to the communities working on immunology and biological sequence data analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a deep learning framework for predicting T cell receptor (TCR) binding to antigens (peptide-MHC) using a combination of data augmentation techniques to address class imbalance in experimental datasets, and introduces both peptide-specific and pan-specific models for TCR-MHC-I binding prediction. The authors leverage a large, curated dataset of experimentally validated TCR-MHC-I pairs and apply a data augmentation strategy based on generative modeling to generate new TCR sequences. The approach is evaluated on benchmark datasets, and the resulting models demonstrate improved accuracy and robustness.

      Strengths:

      The most significant contribution of the manuscript lies in its data augmentation approach to mitigate class imbalance, particularly for rare but immunologically relevant epitope classes. The authors employ a generative strategy based on two deep learning architectures:

      (1) a Restricted Boltzmann Machine (RBM) and

      (2) a BERT-based language model, which is used to generate new CDR3B sequences of TCRs that are used as synthetic training data for creating a class balance of TCR-pMHC binding pairs.

      The distinction between peptide-specific (HLA allele-specific) and pan-specific (generalized across HLA alleles) models is well-motivated and addresses a key challenge in immunogenomics: balancing specificity and generalizability. The peptide-specific models show strong performance on known HLA alleles, which is expected, but the pan-specific model's ability to generalize across diverse HLA types, especially those not represented in training, is critical.

      Weaknesses:

      The paper would benefit from a more rigorous analysis of the biological validity of the augmented data. Specifically, how do the synthetic CDR3B sequences compare to real CDR3B in terms of sequence similarity, motif conservation? The authors should provide a quantitative assessment (via t-SNE or UMAP projections) of real vs. augmented sequences, or by measuring the overlap in known motif positions, before and after augmentation. Without such validation, the risk of introducing "hallucinated" sequences that distort model learning remains a concern. Moreover, it would strengthen the argument if the authors demonstrated that performance gains are not merely due to overfitting on synthetic data, but reflect genuine generalization to unseen real data. Ultimately, this can only be performed through elaborate experimental wet-lab validation experiments, which may be outside the scope of this study.

      While generative modeling for sequence data is increasingly common, the choice of RBM, which is a relatively older architecture, could benefit from stronger justification, especially given the emergence of more powerful and scalable alternatives (e.g., ProGen, ESM, or diffusion-based models). While BERT was used, it will be valuable in the future to explore other architectures for data augmentation.

      The manuscript would be more compelling if the authors performed a deeper analysis of the pan-specific model's behavior across HLA supertypes and allele groups. Are the learned representations truly "pan" or merely a weighted average of the most common alleles? The authors should assess whether the pan-specific model learns shared binding motifs (anchor residue preferences) and whether these features are interpretable through attention maps. A failure to identify such patterns would raise concerns about the model's interpretability and biological relevance.

      The exclusive focus on CDR3β for TCR modeling is biologically problematic. TCRs are heterodimers composed of α and β chains, and both CDR1, CDR2, and CDR3 regions of both chains contribute to antigen recognition. The CDR3β loop is often more diverse and critical, but CDR3α and the CDR1/2 loops also play significant roles in binding affinity and specificity. By generating only CDR3B sequences and not modeling the full TCR αβ heterodimer, the authors risk introducing a systematic bias toward β-chain-dominated recognition, which will not reflect the full complexity of TCR-peptide-MHC interactions.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a thoughtful and well-motivated strategy to address a major challenge in TCR-epitope binding prediction: data imbalance, particularly the scarcity of positive (binding) TCR, peptide pairs. The authors introduce a two-step pipeline combining data balancing, via undersampling and generative augmentation, and a supervised CNN-based classifier. Notably, the use of Restricted Boltzmann Machines (RBMs) and BERT-style transformer models to generate synthetic CDR3β sequences is shown to improve model performance. The proposed method is applied to both peptide-specific and pan-specific settings, yielding notable performance improvements, especially for in-distribution peptides. Generative augmentation also leads to measurable gains for out-of-distribution epitopes, particularly those with high sequence similarity to the training set.

      Strengths:

      (1) The authors tackle the well-known but under-addressed issue of class imbalance in TCR-epitope binding data, where negatives vastly outnumber positive (binding) pairs. This imbalance undermines classifier reliability and generalization.

      (2) The model is tested on both in-distribution (seen epitopes) and out-of-distribution (unseen epitopes) scenarios. Including a synthetic lattice protein benchmark allows the authors to dissect generalization behavior in a controlled environment.

      (3) The paper shows a measurable benefit of generative. For example, AUC improvements of up to +0.11 are observed for peptides closely related to those seen during training, demonstrating the method's practical impact.

      (4) A direct comparison between RBM- and Transformer-based sequence generators adds value, offering the community guidance on trade-offs between different generative architectures in TCR modeling applications.

      Weaknesses:

      (1) Generalization degrades with epitope dissimilarity

      The performance drops substantially as the test epitope becomes more dissimilar to the training set. This is expected, but it highlights an essential limitation of the generative models: they help only when the test epitope is similar to one already seen. Table 1 shows that the performance gain from generative augmentation decreases as the test epitope becomes more dissimilar to the training epitopes. For epitopes with a Levenshtein distance of 1 from the training set, the average AUC improvement is approximately +0.11. This gain drops to around +0.06 for epitopes at distance 2. It becomes minimal for those at distance 4, indicating an explicit limitation in the model's ability to generalize to more distant epitopes. The authors should quantify more explicitly how far the model can generalize effectively. What is the performance degradation threshold as a function of Levenshtein distance?

      (2) What is the minimal number of positive samples needed for data augmentation to help?

      The approach has an intrinsic catch-22: generative models require data to learn the underlying distribution and cannot be applied to epitopes with insufficient data. As a result, the method is unlikely to be effective for completely new epitopes. Could the authors quantify the minimum number of real binders needed for effective generative augmentation? This would be particularly relevant for zero-shot or few-shot prediction scenarios, where only 0-10 positive samples are available. Such experiments would help clarify the practical limits of the proposed strategy.

      (3) Lack of end-to-end evaluation on unseen epitopes as inputs

      The authors frame peptide-specific models as classification over a few known epitopes, a closed-set formulation. While this is useful for evaluating generation effects, it's not representative of the more practical open-set task of predicting binding to truly novel epitopes. A stronger test would include models that take peptides as input (e.g., pan-specific, peptide-conditioned classifiers), including unseen epitopes at test time. Could the authors attempt an evaluation on benchmarks like IMMREP25 or other datasets where test epitopes are excluded from training?

      (4) Focus on β-chain limits generalizability

      The current pipeline is trained exclusively on CDR3β sequences. However, the field is increasingly moving toward single-cell sequencing, which provides paired α/β TCR chain data. Understanding how the proposed approach performs when both chains are available would be valuable. Could the authors evaluate the performance gains on paired α/β information, even in a small subset of single-cell data?

      (5) Synthetic lattice proteins (LPs) have limited biological fidelity

      While the LP-based benchmark presented in Figure 5 is a clever and controlled tool for probing model generalization, it remains conceptually and biophysically distant from real TCR-peptide interactions. Its utility as a toy model is valid, but its limitations should be more explicitly acknowledged:

      a) Over-simplified binding landscape: The LP system is designed for tractability, with a simplified sequence-structure mapping and fixed lattice constraints. As shown in Figure 5c, the LP binding landscape is linearly separable, in stark contrast to the complex and often degenerate nature of real TCR-epitope interactions, where multiple structurally distinct TCRs can bind the same peptide and vice versa.

      b) Absence of immunological context: The LP model abstracts away key biological factors such as MHC restriction, α/β chain pairing, peptide presentation, and structural constraints of the TCR-pMHC complex. These are essential for understanding binding specificity in actual immune repertoires.

      c) Overestimation of generalization: While performance drops on more distant LP structures, even these are structurally and statistically more similar to the training data than truly novel biological epitopes. Thus, the LP benchmark likely underestimates the true difficulty of out-of-distribution generalization in real-world TCR prediction tasks.

      d) Simplified biophysics: The LP simulations rely on coarse-grained energy models and empirical potentials that do not capture conformational dynamics, side-chain flexibility, or realistic binding energetics of TCR-peptide interfaces.

      In summary, while the LP benchmark helps isolate specific generalization behaviors and for sanity-checking model performance under controlled perturbations, its biological relevance is limited. The authors should explicitly frame these assumptions and limitations to prevent overinterpreting results from this synthetic system.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a method to address class imbalance in T cell receptor (TCR)-epitope binding datasets by generating synthetic positive binding examples using generative models, specifically BERT-based architectures and Restricted Boltzmann Machines (RBMs). They hypothesize that improving class balance can enhance model performance in predicting TCR-peptide binding.

      Strengths:

      (1) Interesting biological as well as technical topic.

      (2) Solid technical foundations.

      Weaknesses:

      (1) Fundamental Biological Oversight:

      While the computational strategy of augmenting positive samples via generative models is technically interesting, the manuscript falls short in addressing key biological considerations. Specifically, the authors simulate and evaluate only CDR3β-peptide binding interactions. However, antigen recognition by T cells involves both the α- and β-chains of the TCR. The omission of CDR3α undermines the biological realism and limits the generalizability of the findings.

      (2) Validation of Simulated Data:

      The central claim of the manuscript is that simulated positive examples improve predictive performance. However, there is no rigorous validation of the biological plausibility or realism of the generated TCR sequences. Without independent evaluation (e.g., testing whether synthetic TCR-peptide pairs are truly binding), it remains unclear whether the performance gains are biologically meaningful or merely reflect artifacts of the generation process.

      (3) Risk of Bias and Overfitting:

      Training and evaluating models with generated data introduces a risk of circularity and bias. The observed improvements may not reflect better generalization to real-world TCR-epitope interactions but could instead arise from overfitting to synthetic patterns. Additional testing on independent, biologically validated datasets would help clarify this point.

    5. Author response:

      We would like to thank editors and reviewers for their time spent on our work, fair assessments and constructive criticism. We plan to address their concerns in the future revision as follows, detailed by topic.

      (1) Limitations of focusing on CDR3β only

      In its current state, our work tested the proposed pipeline of data augmentation for binding prediction on benchmark datasets limited to peptide+CDR3β sequence pairs only. As pointed out by all the reviewers, the TCR-peptide interaction is more complex and involves also other regions of the receptor (such as the CDR3α chain) and the MHC presenting the peptide as well. To investigate how the inclusion of additional information impacts results, we plan to apply our pipeline in a setting where the generative protocol is extended to generate paired α and β. The supervised classifier will then receive a concatenation of α+β chains as inputs. We will compare the performance of this classifier with the one using β chains only, and add this analysis to the revised manuscript.

      (1) Validation of generated sequences and interpretation of the features learned by the generative model

      The reliability of the generative model in augmenting the training set with biologically sensible sequences is a crucial assumption of our approach, and we agree with the reviewers raising this as a main concern. Before stating our strategy to improve the soundness of the method, let us first point out a few aspects already considered in the present manuscript:

      • The test set of the classifier is always composed of real sequences: in this way, an increase in performance due to data augmentation cannot be due to overfitting to synthetic, possibly unrealistic, sequences.

      • The generative protocol is initialized from real sequences, and used to generate sequences not too far from them. In this respect, it could be taken as a way to “regularize” the simplest strategy of data augmentation, random oversampling (taking multiple copies of sequences at random to rebalance the data). This procedure avoids generating “wildly hallucinated” sequences with unreliable models. We will better quantify this statement (see below).

      • The training protocol is tailored to push the generative model towards learning binding features between peptide and CDR3β sequences (and not merely fitting their local statistics separately). For example, in the pan-specific setting, during training of the generative model on peptide+CDR3β sequences, the masked language modeling task is modified to force the model to recover the missing amino acid using only the other sequence context.

      We will better stress these points in the revised manuscript. To further validate the generative protocol in the future revision, we will carry out additional sanity checks on the generated data to confirm that the synthetic sequences remain biologically plausible and comparable to real ones.

      (1) Assessment of the performance of the pan-specific protocol for out-of-distribution data:

      To better clarify how the degradation in performance of a classifier tested on out-of-distribution data is impacted by the dissimilarity between test and training data distribution, we will improve the synthetic analysis currently reported in Table 1, adding confidence intervals for accuracy, quantifying thresholds on the distance for the method to work, providing t-SNE embeddings of in- and out-of distribution data.

      (2) Quantification of the threshold for the number of examples per class in order to train the generative model and obtain a performance increase

      In the paper, we adopted an operative common-sense threshold of at least 100 sequences per class in order to apply our data augmentation pipeline. We will quantify this effect testing this threshold in the revised manuscript, in order to (i) emphasize the limits of this two-step generative protocol in the low-data regime and to (ii) assess if the generative model falls back to a random oversampling strategy (due to strong overfitting) when few data are available for training.

      (3) Motivation for the use of RBMs:

      While RBMs have known limitations, their use in our pipeline (together with the more modern TCR-BERT, that we also test) is mainly motivated by the fact that they provide measurable increases in performance with data augmentation despite their simple 2-layer architecture. We stress that simpler generative (profile) models are unable to show this increase, see Appendix 3. In this respect, the RBM provides a minimal generative model allowing us to augment data successfully, and a lower bound to the increase of performance with respect to more complex architectures trained on more data. We will report this point of view in the text.

      (4) Clarification on the role of lattice proteins as an oversimplified toy model for protein interaction

      We agree with the points raised by Reviewer #2 on the limitations of lattice proteins as a model for protein interaction. Indeed, we used it merely as a toy model for phenomenology, a strategy whose validity has been fairly acknowledged by the reviewer. We will report in the main text all the drastic simplifications and reasons why the reader should take the comparison to real data with great care.

    1. eLife Assessment

      This valuable study combines microscopy and CRISPR screening in two different cell lines to identify factors involved in global chromatin organization, using centromere clustering as a proxy. Follow-up cell cycle synchronisation studies confirm roles in centromere clustering in mitosis. However, incomplete characterisation of the cell lines used limits the interpretation of the findings. The study will interest researchers studying genome organisation in mitosis.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guin and colleagues establish a microscopy-based CRISPR screen to find new factors involved in global chromatin organization. As a proxy of global chromatin organization, they use centromere clustering in two different cell lines. They find 52 genes whose CRISPR depletion leads to centromere clustering defects in both cell lines. Using cell cycle synchronisation, they demonstrate that centromeres-redistribution upon depletion of these hits necessitates cell cycle progression through mitosis.

      Strengths:

      This manuscript explores the mechanisms of global chromatin organization, which is a scale of chromatin organization that remains poorly understood. The imaging-based CRISPR screen is very elegant, and the use of appropriate positive and negative controls reinforces the solidity of the findings.

      Weaknesses:

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

    3. Reviewer #2 (Public review):

      The authors begin by highlighting the importance of genome organisation in cellular compartmentalisation and identity. They focus their study on centromeres - key chromosomal features required for segregation-and aim to identify proteins responsible for their spatial distribution in interphase nuclei. However, none of the experimental data addresses broader aspects of genome architecture, such as individual chromosome territories or A/B compartments. As such, the title of the article may be misleading and would benefit from being more specific, for example: "Identification of factors influencing centromere positioning in interphase."

      Strengths:

      One of the strengths of the paper is the comprehensive CRISPR-based screening and the comparative analysis between two distinct cell lines.

      Including further investigation into factors that behave differently across these cell lines - particularly in relation to expression levels or the unique "inverted architecture" of RPE cells-would have added valuable depth.

      Weaknesses:

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      Finally, the additive effects observed in double knockdowns do not necessarily confirm pathway independence. It is possible that mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Guin et al. use a CRISPR KO screen of ~1000 candidates in two human cell lines, along with high-throughput image analysis, to demonstrate that orderly progression through mitosis shapes centromere organization. They identify ~50 genes that perturb centromere clustering when depleted in both RPE1 and HCT116 cells and validate many of these hits using RNAi. They then use auxin-mediated acute depletion of four factors (NCAPH2, KI67, SPC24, and NUF2) to demonstrate that their effects on centromere clustering require passage through mitosis. They further suggest that the lack of these factors during mitosis leads to the disorganization of centromeres on the mitotic spindle, and these effects persist in the subsequent interphase. Overall, the manuscript is clear, well-written, the experiments performed are appropriate, and the data are interpreted accurately. In my opinion, the main strength of this manuscript is the discovery of several hits associated with altered centromere organization. These hits will serve as a solid foundation for future work investigating genome organization in human cells. On the other hand, how the changes in centromere organization relate to other aspects of interphase genome architecture (A/B compartments, chromosome territories, etc) remains unclear and represents the main shortcoming of this manuscript.

      Comments:

      (1) Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      (2) I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

    1. eLife Assessment

      This important study reports that two distinct waves of ovarian follicles contribute to oocyte production in mice. The paper provides large amounts of data that will benefit future studies, although the methods and analysis are considered incomplete at present. Justification for the criteria of wave 1 follicles would benefit from further explanation and discussion. This work will be of interest to ovarian biologists and physicians working on female infertility.

    2. Reviewer #1 (Public review):

      Multiple waves of follicles have been proven to exist in multiple species, and different waves of follicles contribute differently to oogenesis and fertility. This work characterizes the wave 1 follicles in mouse comprehensively and compares different waves of follicles regarding their cellular and molecular features. Elegant mouse genetics methods are applied to provide lineage tracing of the wave 1 folliculogenesis, together with sophisticated microscopic imaging and analyses. Single-cell RNA-seq is further applied to profile the molecular features of cells in mouse ovaries from week 2 until week 6. While extensive details about the wave 1 follicles, especially the atresia process, are provided, the authors also identified another group of follicles located in the medullary-cortical boundary, which could also be labeled by the FoxL2-mediated lineage tracing method. The "boundary" or "wave 1.5" follicles are proposed by the authors to be the earliest wave 2 follicles, which contribute to the early fertility of puberty mice, instead of the wave 1 follicles, which undergo atresia with very few oocytes generated. The wave 1 follicle atresia, which degrades most oocytes, on the other hand, expands the number of theca cells and generates the interstitial gland cells in the medulla, where the wave 1 follicles are located. These gland cells likely contribute to the generation of androgen and estrogen, which shape oogenesis and animal development. By comparing scRNA-seq data from cells collected from week 2 until week 6 ovaries, the author profiled the changes in numbers of different cells and identified key genes that differ between wave 1 and wave 2 follicles, which could potentially be another driver of different waves of folliculogenesis. In summary, the authors provide a high amount of new results with good quality that illustrate the molecular and cellular features of different waves of mouse follicles, which could be further reused by other researchers in related fields. The findings related to the boundary follicles could potentially bring many new findings related to oogenesis.

      This paper is overall well-written with solid and intriguing conclusions that are well supported. The reviewer only has some minor comments for the authors' consideration that could potentially help with the readability of the paper.

      (1) The authors identify the wave 1.5 follicles at the medullary-cortex boundary, which begin to develop shortly after 2 weeks. Since the authors already collected scRNA-seq data from week 2 until week 6, could any special gene expression patterns be identified that make wave 1.5 follicle cells different from wave 1 and wave 2?

      (2) Are Figures 1C and 1E Z projections from multiple IF slices? If so, please provide representative IF slice(s) from Figures 1C and 1E and clearly label wave 1 and wave 2 follicles to better illustrate how the wave 1 follicles are clarified and quantified.

      (3) For Figure 3D, please also provide an image showing the whole ovary section, like in Figures 3A and 3C, to better illustrate the localization and abundance of different cells.

      (4) In Figure 4H, expressions of HSD3B1 and PLIN1 seem to be detected in almost all medulla cells. Does this mean all medulla cells gain gland cell features? Or there is only a subset of the medulla cells that are actively expressing these 2 proteins. Please provide image(s) with higher magnification to show more clearly how the expression of these 2 proteins differs among different cells.

      (5) Figure 5: The authors discussed cell number changes for different types of cells from week 2 to week 6. A table, or some plots, visualizing numbers of different cell types, instead of just providing original clusters in Dataset S6, at different time points, would make the changes easier to observe.

      (6) Figure S7: It would be more helpful to directly show the number of wave 1 follicles.

      (7) Did the fluorescence cryosection staining (Line 587 - 595) use the same buffers as in the whole-mount staining (Line 575 - 586)? Please clarify.

      (8) In Line 618, what tissue samples were collected? Please point out clearly.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores an important question concerning the developmental trajectory of wave 1 ovarian follicles, leveraging valuable tools such as lineage tracing and single-cell RNA sequencing. These approaches position the authors well to dissect early follicle dynamics. The study would benefit from more in-depth analysis, including quantification using the lineage-traced ovaries, and comparison of wave 1 and 2 follicular cells per stage within the single cell dataset.

      Strengths:

      This study aims to address an important question regarding the developmental trajectories of wave 1 ovarian follicles and how they differ from wave 2 follicles that contribute to long-term fertility. This is an important topic, as many studies on ovarian follicle development rely on samples collected at perinatal timepoints in the mouse, which primarily represent wave 1 follicles, to infer later fertility. The research group has the tools and expertise necessary to tackle these questions.

      Weaknesses:

      Wave 1 follicles are quantified based on the criteria of oocytes larger than 20 µm located within the medullary region, using whole-mount staining. However, the boundary between the medulla and cortex appears somewhat arbitrary. Quantification using FOXL2-lineage-traced ovaries provides a more reliable method for identifying wave 1 follicles. As the developmental trajectory of wave 1 follicles has been well described in Zhang et al. 2013, it would be valuable to provide a more detailed quantification of both labeled and unlabeled follicles by specific follicle stages. In fact, in Zhang et al. 2013, the authors demonstrated that lineage-labeled primordial follicles can be found at the cortex-medulla boundary, suggesting that the observation of labeled "border follicles" is not unexpected. Quantification by follicle stage would provide greater insight into the timing and development of these follicles.

      Similarly, the analysis of wave 1 follicle loss should be performed on lineage-traced ovaries using cell death markers to demonstrate the loss of oocytes and granulosa cells, while confirming the preservation of theca and interstitial cells. In particular, granulosa cell loss should be assessed directly with cell death markers in lineage-traced ovaries, rather than from the loss of tamoxifen-labeled cells, as labeling efficiency varies between follicles (Figure 2G).

      Single-cell RNA sequencing presents a valuable dataset capturing the development of first-wave follicles. The use of a 40µm cell strainer during cell collection for the 10x platform may explain the exclusion of larger oocytes. However, it is still surprising that no oocytes were captured at all. The central question, how wave 1 follicular cells differ from wave 2 cells, should be investigated in more depth, with results validated on FOXL2-lineage-traced ovaries (i.e., Wnt4 staining in wave 1 antral follicles versus wave 2 using lineage-traced ovaries). This analysis should span all stages of follicle development. It also appears to be a missed opportunity that the single-cell sequencing analysis was not performed on lineage-traced ovaries, which would have enabled more definitive identification of wave 1-derived cells.

      Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles.

    4. Author response:

      The eLife assessment states that our manuscript is important only as a source of data for others to use in the future. Our methods and analysis of wave 1 follicles were said to be "incomplete" because one of two reviewers claimed we did not prove that 80% of wave 1 oocytes turn over by 5 wk.

      We believe that this assessment is simply wrong because critical supporting data already present in the existing manuscript was not understood by one reviewer. Wave 1 follicular oocyte turnover was said to be unproven and to remain uncertain because evidence of death was based only on a lack of Ddx4 staining. New experiments documenting expression of cell death markers, were said to be needed to show the oocytes died. However, our work was not based on the analysis of sectioned material, but used whole mount 3D reconstruction microscopy of cleared ovary preparations. Oocyte death was determined by the absence of an oocyte in fully reconstructed follicles and its replacement with an empty cavity, not just the absence of antibody staining. We included images and complete 3D reconstruction movies documenting these methods. The paper also documents that the holes frequently still contained zona pellucida remnants indicating the former presence of an oocyte. Moreover, we observed many intermediates of oocyte death- shrunken and deformed oocytes- and deformations of follicle structures due to the presence of the empty cavities. Controls showed that Ddx4 staining in the context of 3D imaging always revealed an obvious giant labeled oocyte in 100% of wave 1 follicles prior to death, and in wave 1.5 and wave 2 follicles at all stages. Thus, our methodology is already fully reliable. The reviewer is correct that the entire program of wave 1 development including their programmed turnover would be interesting to explore further. We already provided a large amount of new gene expression information, and documented the first examples of wave 1-specific gene expression. Further studies are not needed for the major conclusions of the paper and can wait for a follow up study.

      Secondly, the existence of wave 1.5 is not "speculative," as stated by the reviewing editor. We extensively validated and quantified the existence of wave 1.5 primordial follicles following Foxl2-cre activation at E16.5, and analysis at 2 wks in multiple experiments. Additionally, we showed wave 1.5 follicles were present at the medullar/cortex border at 2 wks even after activation of Foxl2-cre at E14.5. Our paper also connected for the first time wave 1.5 follicles to a population of non-growing, "poised" primordial follicles at this identical location near the medulla/cortex boundary by Meinsohn et al. in 2021. These follicles had not started to develop yet, and their ultimate fate was not known. We followed the development of these follicles and determined several differences in wave 1.5 follicle gene expression compared to wave 1. As noted in the assessment, our findings on wave 1.5 are now already being extended to other systems such as primate ovaries (adopting our name "wave 1.5" from our bioRxiv manuscript). The simultaneous claims that our discovery of wave 1.5 exists is speculative, and also that other people are finding wave 1.5 follicles in the species they are studying are logically incompatible.

      Response to reviewer 2:

      Line 239-245: Please note that Zhang et al. 2013 also show that lineage-labeled primordial follicles can be found at the cortex-medulla boundary (see their Figure 1B).

      The single image in the Zheng et al. 2014 paper may or may not show mosaic primordial follicles, but it would not be surprising since the experiment was identical to experiments in the paper. However, that single picture is only meaningful in the context of our subsequent work reported in the current manuscript. There was no mention of these follicles in the text of Zheng et al. 2014, no documentation or quantitation of their numbers, and no discussion or understanding of their significance. The incorrect conclusions of the paper were that wave 1 follicles- meaning rapidly developing follicles in the medulla- give rise to most early offspring. This conclusion reversed the previously accepted (and essentially correct) view that wave 1 follicles did not contribute significantly to fertility.

      "Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles." 

      We showed by lineage marking that only about 25 of about 200 wave 1 follicles survive even to wk 5. This clearly does prove our conclusion that the great majority of wave 1 follicles do not contribute to fertility.

    1. eLife Assessment

      This important study reports that higher genetically predicted BMI is associated with a modestly increased risk of head and neck cancer. The convincing evidence is supported by rigorous Mendelian Randomization approaches, using multiple genetic instruments and models that reduce sensitivity to pleiotropy. However, results from pleiotropy-robust analyses were less consistent, which limits the strength of causal inference. The work will be of interest to researchers studying cancer risk factors and genetic epidemiology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have conducted the largest to date Mendelian Randomization (MR) analysis of the association between genetically predicted measures of adiposity and risk of head and neck cancer (HNC) overall and by subsites within HNC. MR uses genetic predictors of an exposure, such as gene variants associated with high BMI or tobacco use, rather than data from individual physical exams or questionnaires and if it can be done in its idealized state, there should be no problems with confounding. Traditional epidemiologic studies have reported a variety of associations between BMI (and a few other measures of adiposity) and risk of HNC that typically differs by the smoking status of the subjects. Those findings are controversial given the complex relationship between tobacco and both BMI and HNC risk. Tobacco smokers are often thinner than no-smokers so this could create an artificial ('confounded') association that may not be fully adjusted away in risk models. The findings of a BMI-HNC association are often attributed to residual confounding and this seems ripe for an MR approach if suitable genetic instrumental variables can be created. Here the authors built a variety of genetic instrumental variables for BMI and other measures of adiposity as well as two instrumental variables for smoking habits and then tested their hypotheses in a large case-controls set of HNC and controls with genetic data.

      The authors found that the genetic model for BMI was associated with HNC risk in simple models, but this association disappeared when using models that better accounted for pleiotropy, the condition when genetic variants are associated with more than one trait such as both BMI and tobacco use. When they used both adiposity and tobacco use genetic instruments in a single model, there was a strong association with genetically predicted tobacco use (as is expected) but there was no remaining association with genetic predictors of adiposity. They conclude that high BMI/adiposity is not a risk factor for HNC.

      Strengths:

      The primary strength was the expansive use of a variety of different genetic instruments for BMI/adiposity/body size along with employing a variety of MR model types, several of which are known to be less sensitive to pleiotropy. They also used the largest case-control sample size to date.

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR and the addition of those models is therefore quite important as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result and in that case, they are more limited in their ability to test their hypothesis as these models do not show a robust BMI instrumental variable association.

      Comments on the revised manuscript:

      After the first round of review, the authors have improved the manuscript by (1) adding the requested power calculations and adding text to help the reader integrate that additional information; (2) adding the main effects for the tobacco instruments; (3) updating the comparison of their results to the prior literature; (4) and some other edits to the text. They have declined to include the smoking stratified estimates and provide a rationale for this decision that references the potential for collider bias. While true that yet another bias might be introduced, that gets added to the list and the careful reader would know that. Many important questions in cancer etiology can only be addressed via observational approaches and each observational approach has the potential for a long list of biases. The best inference comes from integrating the totality of the data and realizing that most conclusions are subject to updating as we conduct more work and learn more.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      Reviewing Editor Comments:

      We suggest that the authors add power estimates to assess whether the sample size is sufficient, given the strength and variability of the genetic instruments. It would also be helpful to present effect estimates for the tobacco instruments alone, to clarify their independent contribution and improve the interpretation of the joint models. In addition, the role of pleiotropy should be addressed more clearly, including which model is considered primary. Stratified analyses by smoking status are encouraged, as prior studies indicate that BMI-HNC associations may differ between smokers and non-smokers. Finally, the comparison with previous studies should be revised, as most reported null findings without accounting for tobacco instruments. If this study finds an association, it should not be framed as a replication

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we have incorporated them in the revised manuscript. We have edited the text as follows (lines 151-155):“Consequently, we used the total R<sup>2</sup> values to examine the statistical power in our study[42]. However, we acknowledge that the value of post-hoc power calculations is limited, since the statistical power estimated for an observed association is already reflected in the 95% confidence interval presented alongside the point estimate[43].” We have also added supplementary figures 1 and 2.

      We can see that when using the latest HEADSpAcE data we were able to detect BMI-HNC ORs as small as 1.16 with 80% power, while the GAME-ON dataset only permitted the detection of ORs as small as 1.26 using the same BMI instruments (Figure B). We have explained these figures in the results section as follows (lines 257-263): “Using the BMI genetic instruments (total R<sup>2</sup>= 4.8%) and an α of 0.05, we had 80% statistical power to detect an OR as small as 1.16 for HNC risk (Supplementary Figure 1). For WHR (total R<sup>2</sup>= 3.1%) and WC (total R<sup>2</sup>= 4.4%), we could detect odds ratios (ORs) as small as 1.20 and 1.17, respectively. This is an improvement in terms of statistical power compared to the GAME-ON analysis published by Gormley et al.[28], for which there was 80% power to detect an OR as small as 1.26 using the same BMI genetic instruments (Supplementary Figure 2).”

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We have now clarified this in the methods section of the revised manuscript as advised. Lines 165-171:

      “Because the IVW method assumes all genetic variants are valid instruments[44], which is unlikely the case, three pleiotropy-robust two-sample MR methods (i.e., MR-Egger[45], weighted median[46] and weighted mode[47]) were used in sensitivity analyses. When the magnitude and direction of effect estimates are consistent across methods that rely on different assumptions, the main findings are more convincing. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even if they are not equally powered.”

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      We agree that it would be useful to present the univariable MR effect estimates for smoking behaviour and HNC risk along those obtained using multivariable MR. We have now included the univariable MR estimates for both smoking behaviour variables as a note under Supplementary Table 11 and in the manuscript (lines 316-318): “In univariable IVW MR, both CSI and SI were linked to an increased risk of HNC (CSI OR=4.47 per 1-SD higher CSI, 95%CI 3.31–6.03, p<0.001; SI OR=2.07 per 1-SD higher SI 95%CI 1.60–2.68, p<0.001) (Additional File 2: note in Supplementary Table 11).”

      We understand the appeal of conducting stratified MR analyses by smoking status. However, we anticipate such analyses would hinder the interpretation of our findings as they can induce collider bias which could spuriously lead to different effect estimates across strata[12, 13].

      We thank the reviewer/editors for their comment regarding the way we frame of our findings. We have now edited the discussion section to highlight our study results are different to those obtained in studies that do not account for smoking behaviour. Lines 398-401: “With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      Reviewer #1 (Recommendations for the authors):

      The authors do share a table of the percent variance explained of the different genetic instruments, which vary widely, and that table is very welcome because we can get some sense of their utility. The problem is that they don't translate that into a power estimate for the case-control study size that they use. They say that it is the biggest to date, which is good, but without some formal power estimate, it is not particularly reassuring. A framework for MR study power estimates was reported in PMID: 19174578, but that was using very simple MR constructs in use in 2009, and it isn't clear to me if that framework can be used here. That power paper suggests that weak genetic instruments need very large sample sizes, far larger than what is used in the current manuscript. I am unable to estimate the true strength of the instruments used here, and so I am unsure of whether power is an issue or not.

      We have now included power calculations in our manuscript to address the reviewer’s concerns. Nevertheless, as mentioned above, post-hoc power calculations are of limited value, as statistical power is already reflected in the uncertainty around the point estimates (the 95% confidence intervals). Hence, it is important to avoid drawing conclusions regarding the likelihood of true effects or false negatives based on these calculations.

      Although the hypothesis here is that smoking accounts for the apparent BMI association previously reported for HNC, it would have been preferable to see the estimates for their 2 genetic instruments for tobacco alone. The current results only show the BMI instruments alone and then with the tobacco instruments. I would like to see what the risk estimates are for the tobacco instrument alone, so that I can judge for myself what happens in the joint models. As presented, one can only do that for the BMI instruments.

      We thank the reviewer for this comment. The univariable IVW MR estimate of smoking initiation was OR=2.07 (95%CI 1.60 to 2.68, p<0.001), while the one for comprehensive smoking index was OR=4.47 (95%CI 3.31 to 6.03, p<0.001). We have included this information in the manuscript as requested (please see response to reviewing editor above).

      On line 319, they write that "We did not find evidence against bias due to correlated pleiotropy..." I find this difficult to parse, but I think it means that they should believe that correlated pleiotropy remains a problem. So again, they seem to see their primary model as compromised, and so do I. This limitation is again stated by the authors on lines 351-352.

      We apologise if the wording of the sentence was not easy to understand. When using the CAUSE method, we did not find evidence to reject the null hypothesis that the sharing (correlated pleiotropy) model fits the data at least as well as the causal model. In other words, our CAUSE finding and the inconsistencies observed across our other sensitivity analyses led us to believe that our main IVW MR estimate for BMI-HNC was likely biased by correlated pleiotropy. We believe it is important to explore the source of this bias, which is why we used multivariable MR to investigate the direct effect of BMI on HNC risk while accounting for smoking behaviour.

      In the following paragraphs (lines 358-369), the authors state that their findings are consistent with prior reports, but that doesn't seem to be the case if we take their primary BMI instrument as representing the outcome of this manuscript. Here, they find an association between the BMI instrument and HNC risk, but in each of the other papers they present the primary finding was null without the extensive model changes or the aim of accounting for tobacco with another instrument. I don't see that as replication.

      This is a good point. We have now edited the discussion of our manuscript to avoid giving the impression that our findings replicate those from studies that do not account for smoking behaviour in their analyses. We have edited lines 384-401 as follows:

      “Previous MR studies suggest adiposity does not influence HNC risk[27-29]. Gormley et al.[28] did not find a genetically predicted effect of adiposity on combined oral and oropharyngeal cancer when investigating either BMI (OR=0.89 per 1-SD, 95% CI 0.72–1.09, p=0.26), WHR (OR=0.98 per 1-SD, 95% CI 0.74–1.29, p=0.88) or waist circumference (OR=0.73 per 1-SD, 95% CI 0.52–1.02, p=0.07) as risk factors. Similarly, a large two-sample MR study by Vithayathil et al.[29] including 367,561 UK Biobank participants (of which 1,983 were HNC cases) found no link between BMI and HNC risk (OR=0.98 per 1-SD higher BMI, 95% CI 0.93–1.02, p=0.35). Larsson et al.[27] meta-analysed Vithayathil et al.’s[29] findings with results obtained using FinnGen data to increase the sample size even further (N=586,353, including 2,109 cases), but still did not find a genetically predicted effect of BMI on HNC risk (OR=0.96 per 1-SD higher BMI, 95% CI 0.77–1.19, p=0.69). With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      We also deleted part of a sentence in the discussion section, so lines 416-418 now look as follows: “An important strength of our study was that the HEADSpAcE consortium GWAS used had a large sample size which conferred more statistical power to detect effects of adiposity on HNC risk compared to previous MR analyses[27-29].”

      On lines 384-386 they note a strength is that this is the largest study to date, but I would reiterate that larger and more powerful does not equate to adequately powered.

      This is true. We have included power calculations in the manuscript as requested.

      It's well known that different HNC subsites have different etiologies, as they mention on lines 391-392, and it is implicit in their use of data on HPV positive and negative oropharyngeal cancer. They say that they did not find evidence for heterogeneity in this study, but that would only be true for the null BMI instrument. The effect sizes for their smoking instruments are strikingly different between the subsites.

      We agree and are sorry for the confusion we may have caused by the way we worded our findings. We have edited the text to clarify that the lack of subsite heterogeneity only applied to our results for BMI/WHC/WC-HNC risk. Lines 418-424 now read as follows:

      “Furthermore, the availability of data on more HNC subsites, including oropharyngeal cancers by HPV status, allowed us to investigate the relationship between adiposity and HNC risk in more detail than previous MR studies which limited their subsite analyses to oral cavity and overall oropharyngeal cancers[28, 68]. This is relevant because distinct HNC subsites are known to have different aetiologies[69], although we did not find evidence of heterogeneity across subsites in our analyses investigating the genetically predicted effects of BMI, WHR and WC on HNC risk.”

      Finally, the literature on mutational patterns gives us strong reason to believe that HNC caused by tobacco are biologically distinct from tumors not caused by tobacco. The authors report in the introduction that traditional observational studies of BMI and HNC have reported different findings in smokers versus never smokers, so I would assume there is a possibility that the BMI instrument could have different associations with tumors of the tobacco-induced phenotype and tumors with a non-tobacco induced phenotype. I would assume that authors have access to the data on self-reported tobacco use behavior, even if they can't separate these tumors by molecular types. Stratifying their analysis by tobacco users or not might reveal different results with the BMI instrument.

      We appreciate the reviewer’s comment. We agree that it would have been interesting to present stratified analyses by smoking status along our main findings. However, we decided against this because of the risk of inducing collider bias in our MR analyses i.e., where stratifying on smoking status may induce spurious associations between the adiposity instruments and confounding factors. Multivariable MR is considered a better way of investigating the direct effects of an exposure (adiposity) on an outcome (HNC) accounting for a third variable (smoking)[14], which is why we opted for this method instead.

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

      (12) Coscia C, Gill D, Benitez R, Perez T, Malats N, Burgess S: Avoiding collider bias in Mendelian randomization when performing stratified analyses. Eur J Epidemiol 2022, 37(7):671-682.

      (13) Hamilton FW, Hughes DA, Lu T, Kutalik Z, Gkatzionis A, Tilling K, Hartwig FP, Davey Smith G: Non-linear Mendelian randomization: evaluation of effect modification in the residual and doubly-ranked methods with simulated and empirical examples. Eur J Epidemiol 2025.

      (14) Sanderson E, Davey Smith G, Windmeijer F, Bowden J: An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol 2019, 48(3):713-727.

    1. eLife Assessment

      This important work examines how microexons contribute to brain activity, structure, and behavior. The authors find that loss of microexon sequences generally has subtle impacts on these metrics in larval zebrafish, with few exceptions. The evidence is solid, using modern high-throughput phenotyping methodology in zebrafish. Overall, this work will be of interest to neuroscientists and generate further studies of interest to the field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      Strengths:

      - The authors provide a qualitative analysis of microexon inclusion during early zebrafish development

      - The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

    3. Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with a behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      Weaknesses:

      Although the manuscript includes evidence for many mutants that microexon deletion has minimal effect on full length transcript levels, some of the microexon loss does alter transcript levels. Since the mutations usually yielded no phenotype, these effects on full-length transcripts are unlikely to be a major confound. For mircoexon mutants displaying phenotypes, future work will have to tease apart whether secondary effects on the transcripts are contributing to the phenotype.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      It has been established that srrm4 mutants exhibit no overt morphological phenotypes and are not visually impaired (Ciampi et al., 2022). We are coordinating our publication with Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860), which shows that srrm4 mutants also have minimal behavioral phenotypes. In contrast, srrm3 mutants have severe vision loss, early mortality, and numerous neural and behavioral phenotypes (Ciampi et al., 2022; Lopez-Blanch et al., 2024). We now point out the phenotypes of srrm3/srrm4 mutants in the manuscript.

      We chose not to generate and characterize the behavior and brain activity of srrm3/srrm4 mutants for two reasons: 1) we were aware of two other labs in the zebrafish community that had generated srrm3 and/or srrm4 mutants (Ciampi et al., 2022 and Gupta et al., 2024, https://doi.org/10.1101/2024.11.29.626094; Lopez-Blanch et al., 2024, https://doi.org/10.1101/2024.10.23.619860), and 2) we were far more interested in determining the importance of individual microexons to protein function, rather than loss of the entire splicing program. Microexon inclusion can be controlled by different splicing regulators, such as srrm3 (Ciampi et al., 2022) and possibly other unknown factors. Genetic compensation in srrm4 mutants could also result in microexons still being included through actions of other splicing regulators, complicating the analysis of these regulators. We mention srrm4 in the manuscript to point out that some selected microexons are adjacent to regulatory elements expected of this pathway. We did not, however, choose microexons to mutate based on whether they were regulated by Srrm4, making the characterization of srrm3/srrm4 mutants disconnected from our overarching project goal.

      We have edited the Introduction as follows to clarify our goal: “Studies of splicing regulators such as srrm4 impact the entire splicing program, making it impossible to determine the importance of individual microexons to protein function. Further, microexons could still be differentially included in a regulatory mutant via compensation by other splicing factors ...”

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      We are unclear whether “microexon splicing genes” refers to the splicing regulators srrm3/srrm4, which we choose not to study in this work (see response to point #1 above), or the genes that contain microexons. The severe visual phenotypes of srrm3 mutants confounds the study of microexon splicing in this line because altered splicing levels could be due to downstream changes in this significantly different developmental context. A detailed discussion of splicing consequences on removal of microexons from microexoncontaining genes is in the response to point #4 below.

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      In the first version of the manuscript, we had included two explicit comparisons of microexon loss with a standard loss-of-function allele, one with a phenotype and one without, in Figure S1 (now Figures S3 and S4) of this manuscript. Beyond the two pairs we had included, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) described mild behavioral phenotypes for a microexon removal for kif1b, and we showed developmental abnormalities for the kif1b loss-of-function allele (now Figure S3). We have now added a predicted protein-truncating allele for ppp6r3. This new line has phenotypes that are similar but slightly stronger in brain activity and structure than the mutant that lacks only the microexon. The prior Figure S1 (now Figures S3 and S4) was only briefly mentioned in the first version of the manuscript, and we now clarify this point in the Results: “Protein-truncating mutations in eleven additional genes that contain microexons revealed developmental and neural phenotypes in zebrafish (Figure S3, Figure S4), indicating that the genes themselves are involved in biologically relevant pathways. Three of these genes– tenm4, sptan1, and ppp6r3 – are also in our microexon line collection.”

      Additionally, we can draw expected conclusions from the literature, as some genes with our microexon mutations have been studied as typical mutants in zebrafish or mice. We have modified our manuscript to include a discussion of both loss-of-function zebrafish and mouse mutants. See the response to below point #4.

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      We acknowledge that unexpected changes to the mRNA of the tested mutants could occur following microexon removal. In particular, all regulatory elements should be removed from the region surrounding the microexon, as any remaining elements could drive the inclusion of unexpected exons that result in premature stop codons.

      First, we have clarified our generated mutant alleles by adding a figure (Figure S1) that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      Second, we have experimentally determined whether the mRNA was modified as expected for a subset of mutants with phenotypes. In all eight tested lines (Figure S2), the microexon was precisely eliminated without causing any other effects on the sequence of the transcript in the neighboring region. We did, however, observe an effect on transcript abundance for one homozygous mutant (vav2). It is possible that complex forms of genetic regulation are occurring that are not induced by unexpected isoforms or premature stop codons. Interestingly, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) eliminated a different microexon in vav2 and also observed a subtle well center preference. If their allele from an entirely different intronic region also results in transcript downregulation, it would support the hypothesis of genetic compensation through atypical pathways. If not, it is likely this phenotype is due specifically to removal of the microexon protein sequence. Not all mutants with phenotypes could be assessed with qRT-PCR because some were no longer present in the lab. All lines were generated in a similar way, however, removing both the microexon and neighboring regulatory elements while avoiding the neighboring exons. Accordingly, we now also explicitly point out those where the clean loss of the microexon was confirmed (eif4g3b, ppp6r3, sptan1, vti1a, meaf6, nrxn1a, tenm3) and those with possibly interesting phenotypes that were not confirmed (ptprd-1, ptprd-2, rapgef2, dctn4, dop1a, mapk8ip3).

      Third, we have further emphasized in the manuscript that these observed phenotypes are extremely mild compared to those observed in over one hundred protein-truncating mutations we have assessed in previous (Thyme et al., 2019; Capps et al., 2024) and unpublished ongoing work. We showed data for one mutant, tcf7l2, which we consider to have moderately strong neural phenotypes, and we have extended this comparison in the revision (new Figure 3G). Additionally, loss-of-function alleles for some microexoncontaining genes have strong developmental phenotypes, as we showed in Figure S1 (now Figures S3 and S4) of this manuscript in addition to our published work (Thyme et al., 2019; Capps et al., 2024). It is known from the literature that the loss-of-function mutants for mapk8ip3 are stronger than we observed here (Tuttle., et al., 2019), suggesting that only the microexon is removed in our line. The microexons in Ptprd are also well-studied in mice, and we expect that only the microexon was removed in our lines. Both Dctn4 and Rapgef2 are completely lethal prior to weaning in mice (the International Mouse Phenotyping Consortium).

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      The microexons were not selected based on the likelihood that they were regulated by Srrm4. We have clarified the manuscript regarding this point. There are parallel pathways that can control the inclusion of microexons, such as Srrm3 (Ciampi et al., 2022). It is wellknown that loss of srrm3 has a stronger impact on zebrafish development than srrm4 (Ciampi et al., 2022). The goal of our work was not to investigate these splicing regulators but instead to determine the individual importance of these highly conserved protein changes.

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      We thank the reviewer for their support. The pErk brain activity mapping method is highly sensitive, significantly minimizing the likelihood that the field has simply not looked hard enough for a neural phenotype in these microexon mutants. In our published work (Thyme et al., 2019), we show that brain activity can be drastically impacted without manifesting in differences in those behaviors assessed in a typical larval screen (e.g., tcf4, cnnm2, and more).

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      See response to point 1.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      See response to point 4.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

      See response to points 3 and 4.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      We thank the reviewer for their support.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

      We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details). Our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by LopezBlanch et al. (https://doi.org/10.1101/2024.10.23.619860). We have also already compared the microexon removal to a loss-of-function mutant for two lines (Figures S3 and S4), and we have made this comparison more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 response to reviewer 1).

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a proteintruncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA.

      We now address the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      We thank the reviewer for their support.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements. We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details).

      Reviewer #1 (Recommendations for the authors):

      (1) For most ME mutations, 4 guide sequences are provided. More description / a diagram could be helpful to interpret how ME mutations were generated.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We have also added the following point to the text: “Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1).”

      (2) Figure 1 indicates that there are 45 microexons (MEs) but the text initially indicates that there are 44 that exist in a canonical layout (the text later indicates there are 45). This could be made more clear.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons  (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure 1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      (3) The description of the "canonical layout" as containing TGC / UC repeats could be rewritten as either "containing a UGC motif and UC repeats" or "containing a TGC motif and TC repeats."

      This error has been corrected.

      (4) Why was tcf7l2 selected as a control for MAP mapping?

      The mutant for tcf7l2 is an example of a moderately strong phenotype from a recent study we completed (Capps et al., 2025). This mutant was selected because it has both increased and decreased activity and structure and is ideal for setting the range of the graph. We now include a comparison to additional mutants from this study of autism genes (Capps et al., 2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). We also include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (5) What does it mean that of the remaining microexons, most are similar to canonical layout?

      Typically, they would have one of the two regulatory elements instead of both or the location of the possible elements would be slightly farther away than expected. We have clarified this point in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat  or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.”

      (6) Figure 2A is very difficult to see - most are either up or down - suggest splitting into 2 figures - one = heat map, second can summarize values that were both up and down.

      We prefer to retain this information for accuracy. The bubble location is offset to effectively share the box between the orange (decreased) and purple (increased) measures. For example, and as noted in the methods and now expanded upon, a measure can change between 4 and 6 dpf or a measure such as bout velocity could be increased while the distance traveled is decreased (both are magnitude measures). The offset of the bubbles is consistently 0.2 data units in x and y from the center of the box.

      (7) The authors apply rigorous approaches to testing the importance of microexons. I especially appreciate the inclusion of separate biological replicates in the main figures!

      We thank the reviewer for their positive feedback.

      (8) Page 5 line 5 - suggest "compared to homozygous mutants".

      The change has been made.

      (9) For Eif5g3b dark flash phenotype, it's not clear what "p-values are not calculated for response plots" means. A p-value is provided in the plot for ppp6r3 response freq.

      The eif4g3b plot is the actual response trace measuring through pixel changes whereas the ppp6r3 is the frequency of response. While informative, the response plot is time-based data with a wide dynamic range, making the average signal across the entire time window meaningless. We include the p-values for a related measure, the latency for the first 10 dark flashes in block 1 (day6dpfdf1a_responselatency) in the legend.

      (10) The ptprd phenotype in 2D is not described in the text.

      The change has been made.

      (11) Page 7 line 7: "mild" is repeated.

      This error has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Specific points for needed improvement:

      (1) The title should be adjusted to more accurately describe the results. The term 'minimal' is under-representing the findings. 9/45 (20%) of targets in their screen have some phenotype, indicating that a significant number have indeed an important function. Moreover, the phenotypic analysis is limited, leaving space for missed abnormalities (as discussed by the authors). I would therefore suggest a more neutral title such as 'Systematic genetic deletion of microexons uncovers their roles in zebrafish brain development and larval behaviour'.

      While some microexon mutants do have repeatable phenotypes, these phenotypes are far milder than phenotypes observed in other mutant sets. We now include a comparison to additional mutants from this study of autism genes (Capps et al.,2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). The title states that these microexons have a minimal impact on larval zebrafish brain morphology and function, leaving room for the possibility of adult phenotypes. Thus, we prefer to retain this title.

      (2) Do the 45 chosen microexons correspond to the 44 with a canonical layout with TGC and UC repeats? If so, it needs to be explicitly stated in the text that exons were chosen for mutation based on the potential for SRRM4 regulation. If not, then the rationale for the choice of the 45 mutants from the 95 highly conserved events needs to be explained further.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure   1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      There was no clear rationale for those that were selected. We attempted to generate all 95 and some mutants were not successfully generated in our initial attempt. As we found minimal phenotypes, we elected to not continue to make the remaining ones on the list.

      (3) More detail regarding the design of guides for CRISPR is required in the text in the methods section. From Table S2, 4 guides were used per microexon. Were these designed to flank the microexon? How far into the intronic sequence were the guides designed? Were the splicing regulatory sequences (polypyrimidine tract, branchpoint) also removed? The flanking sequences of each of the 45 deletion lines need to be provided.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Following on from the previous point, to ascertain that the phenotype observed is truly due to lack of microexon (rather than other event linked to removed intronic sequences) - for the 7 exons newly identified as functionally important, at least one added deletion line has to be shown, presenting the same phenotype. If making 7 more lines can't be achieved in a reasonable time (we are aware this is a big ask), a MO experiment blocking microexon splicing needs to be provided (may not be ideal for analysis at 6 dpf). For the existing mutants and the new ones (or morphants), sequencing of the mRNAs for the 7 genes in mutants and siblings also needs to be added to check any possible change in other variants.

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a protein-truncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA. We acknowledge that we inadequately described the generation of these alleles, and we now provide Figure S1 to show the microexon’s relationship to possible regulatory elements that impact splicing in unexpected ways if they remain.

      We now acknowledge the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Given the caveats of MOs and transient microinjection for the study of 6 dpf phenotypes, we disagree that this suggested experiment would provide value. The phenotypic assays we use are highly sensitive, and we would not even trust CRISPANTs to yield reliable data. We have added an additional loss-of-function allele for ppp6r3 from the Sanger knockout project, which has a similar but stronger size change to the ppp6r3 microexon-removal line. In addition, our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860).

      To support that these we generated clean removal of these microexons, we experimentally determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point #4 public response to Reviewer 1). We also have already compared the microexon removal to a loss-offunction mutant for two lines (Figure S1), and we have made that outcome more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 public response to Reviewer 1).

      (5) Figure 3: An image of control tcf7l2 mutant brain activity as a reference should be included.

      We now include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (6) Figure 3a/b. The gene names on the y-axis of the pERK and structure comparisons should be reordered to be alphabetical so that phenotypes can be compared by the reader for the same microexon across the two assays.

      These data are clustered so that any similarities between maps can be recognized. We prefer to retain the clustering to compare lines to each other.

      (7) Figure S6 legend. Including graph titles like "day3msdf_dpix_numberofbouts_60" is not comprehensible to the reader so should be replaced with more descriptive text. As should jargon such as "combo plot" and"habituation_day5dpfhab1post_responsefrequency_1_a1f1000d5p" etc.

      The legend has been edited to describe the experiments. Subsections of the prior names are maintained in parentheses to enable the reader to connect the plots in this figure to the specific image and underlying data in Zenodo.

      (8) Page 2 line 21 "to enable proper".

      The change has been made.

      (9) Page 7 line 7. Repeatable phenotypes were mild mild.

      This error has been corrected.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1B is confusingly laid out.

      We are unclear how to modify Figure 1B, as it is a bar plot. We have modified several figures to improve clarity.

      (2) Figure 1E-there are some pictures of zebrafish but to what end? They aren't labelled. The dark "no expression" looks really similar to the dark green, "high expression".

      The zebrafish images represent the ages assessed for microexon inclusion. We have added labels to clarify this point.

      (3) The main text says "microexons were removed by Crispr" but there is no detail in the main text about this at all-- and barely any in the methods. What does it mean to be removed? Cleanly? Or including part of the introns on either side? Etc. How selected, raised, etc? I can glean some of this from the Table S2 if I do a lot of extra work, but at least some notes about this would be important.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Figure 2 - There are no Ns, at least for the plots on the right. The reader shouldn't have to dig deep in Table S2 to find that. It is also unclear why heterozygous fish are not included in these analyses, since there are sibling data for all. Removed for readability of the plots might be warranted, but this should be made explicitly clear.

      The Ns for these plots have been added to the legend. The legend was also modified as follows: “Comparisons to the heterozygous larvae are removed for clarity and available in the Supplementary Materials, as they often have even milder phenotypes than homozygous.”

      (5) Needed data: for those with phenotypes, some evidence should be presented that the full-length transcripts that encode proteins without the microexons are still expressed at the same level and without splicing errors/NMD. Otherwise, some of these phenotypes that were found could be due to knockdown or LOF (or I suppose even overexpression) of the targeted gene.

      We have added a new Supplementary Figure S2 confirming clean removal of the microexons with RT-PCR for a subset of mutants with phenotypes. This figure also includes qRT-PCR for the same subset. We now discuss these findings: Results: “For eight mutant lines, we confirmed that the microexon was eliminated from the transcripts as expected (Figure S2). Although our genomic deletion did not yield unexpected isoforms, qRT-PCR on these eight lines revealed significant downregulation for the homozygous vav2 mutant (Figure S2), indicating possibly complex genetic regulation.”

    1. eLife Assessment

      This fundamental study explores a novel cellular mechanism underlying the degeneration of locus coeruleus neurons during chronic restraint stress. The evidence supporting the overexcitation of LC neurons after chronic stress is compelling. The topic is timely, the proposed mechanistic pathway is innovative, and the findings have translational relevance, particularly regarding therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

    2. Reviewer #1 (Public review):

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions.

      First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence.

      Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the mechanism by which chronic stress induces degeneration of locus coeruleus (LC) neurons. The authors demonstrate that chronic stress leads to the internalization of α2A-adrenergic receptors (α2A-ARs) on LC neurons, causing increased cytosolic noradrenaline (NA) accumulation and subsequent production of the neurotoxic metabolite DOPEGAL via monoamine oxidase A (MAO-A). The study suggests a mechanistic link between stress-induced α2A-AR internalization, disrupted autoinhibition, elevated NA metabolism, activation of asparagine endopeptidase (AEP), and Tau pathology relevant to Alzheimer's disease (AD). The conclusions of this paper are well-supported mainly by the data, but some aspects of image acquisition require further examination.

      Strengths:

      This study clearly demonstrates the effects of chronic stimulation on the excitability of LC neurons using electrophysiological techniques. It also elucidates the role of α2-adrenergic receptor (α2-AR) internalization and the associated upstream and downstream signaling pathways of GIRK-1, using a range of pharmacological agents, highlighting the innovative nature of the work. Additionally, the study identifies the involvement of the MAO-A-DOPEGAL-AEP pathway in this process. The topic is timely, the proposed mechanistic pathway is compelling, and the findings have translational relevance, particularly in relation to therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Comments on revisions:

      The authors have addressed all of the reviewers' comments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a technically impressive dataset showing that repeated excitation or restraint stress internalises somatodendritic α2A adrenergic autoreceptors (α2A ARs) in locus coeruleus (LC) neurons. Loss of these receptors weakens GIRK-dependent autoinhibition, raises neuronal excitability, and is accompanied by higher MAO A, DOPEGAL, AEP, and tau N368 levels. The work combines rigorous whole-cell electrophysiology with barbadin-based trafficking assays, qPCR, Western blotting, and immunohistochemistry. The final schematic is appealing and, in principle, could explain early LC hyperactivity followed by degeneration in ageing and Alzheimer's disease.

      Strengths:

      - Multi-level approach - The study integrates electrophysiology, pharmacology, mRNA quantification, and protein-level analysis.

      -Use of barbadin to block β-arrestin/AP-2-dependent internalisation is both technically precise and mechanistically informative

      -Well-executed electrophysiology

      -translation relevance

      -converges to a model that peers discussed (scientists can only discuss models - not data!)

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public review)

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions. 

      (1) First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence. 

      Although the reviewer #1 commented that “The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence”, we believe that this comment may be unfair. 

      It may be unfair for the reviewer #1 to neglect our responses to the original reviewer comments regarding the direct measurement of cytosolic NA levels. It is true that none of the recommended methods to directly measure cytosolic NA levels are not feasible as described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Recommendations for the authors (2)). To measure extracellular NA with GRAB-NE photometry, α2A-ARs must be expressed in the cell membrane. GRAB-NE photometry is not applicable unless α2A-ARs are expressed, whereas increases in cytosolic NA levels are caused by internalization of α2A-ARs in our study.

      In our study, we elaborated to detect the change in MAO-A protein with Western blot method, instead of examining MAO-A enzymatic activity. Because the relative quantification of active AEP and Tau N368 proteins by Western blot analysis should accurately reflect the change in the MAO-A enzymatic activity, enzymatic assay may not be necessarily required while we admit the necessity of enzymatic assay to better demonstrate the MAO-A activities as discussed in the previously revised manuscript (R1, page 10, lines 314-315). 

      We used the phrase “beyond the scope of the current study” for “the mechanism how Ca<sup>2+</sup> activates MAO-A” as described in the original authors’ responses (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (3)). We do not think that this mechanism must be investigated in the present study because the Ca<sup>2+</sup> dependent nature of MAO-A activity is already known (Cao et al., 2007). 

      On the other hand, because it is not possible to measure cytosolic NA levels with currently available methods, the quantification of the connection between α2A-AR internalization and increased cytosolic NA levels must be considered outside the scope of the study. However, our study demonstrated the qualitative relationship between α2A-AR internalization and active-AEP/TauN-368 reflecting increased cytosolic NA levels, leaving “a small gap in the mechanistic chain of evidence.” Therefore, it may be unreasonable to criticize our study as “leaving a significant gap in the mechanistic chain of evidence” with the phrase “beyond the scope of the current study.” 

      (2) Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      As described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (4)), we had already done another behavioral test using elevated plus maze (EPM) test. By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests showed that chronic RS mice displayed both anxiety-like and memory impairment-like behaviors. Accordingly, we have softened the implication of anxiety and memory impairment (page 13, lines 396-399) and revised the abstract (page 2, line 59) in the revised manuscript (R2).  

      (3) Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Because the quantification of MAO-A expression can be performed with greater accuracy by means of Western blot than by immunohistochemistry, we have moved the immunohistochemical results (shown in Figure 5) to the supplemental data (Figure S8) following the suggestion made by the Reviewer #3. As the relative quantification of active AEP and Tau N368 proteins by Western blot analysis may accurately reflect changes in the MAO-A enzymatic activity which is consistent with the result of Western blot analysis of MAO-A, enzymatic assay or re-staining of immunofluorescence for MAO-A may not be necessarily required. We do not think that a new experiment of Western blot analysis is necessary to re-evaluate MAO-A just because of the lack of the less-reliable quantification of immunohistochemical staining.

      (4) Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      The reviewer #3 is misunderstanding Figure S7. In Figure S7, there are two types of α2A-AR expressing neurons; one is TH-positive LC neuron and the other is TH-negative neuron in mesencephalic trigeminal nucleus (MTN). This clearly indicates that TH staining is specific. Furthermore, α2A-AR staining was much more extensive in MTN neurons than in LC neurons. Thus, α2A-AR signal is not similar to TH signal and there are no labeling errors, which is also evident in the merged image (Figure S7C).

      (5) Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

      Overall, the reviewer #1 was not satisfied with our revision regardless of the authors’ responses. As detailed above in our responses to the replies (1)~(4), we believe that in the original authors’ responses and in the above-described responses we effectively responded to the criticisms by the reviewer #1.

      Reviewer #2 (Public review): 

      Comments on revisions: 

      The authors have addressed all of the reviewers' comments.

      We appreciate constructive and helpful comments made by the reviewer #2.

      Reviewer #3 (Public review): 

      Weaknesses:  

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway  

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study. 

      Authors response: It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      The core idea behind my comment, as well as that of Reviewer 1, was to encourage integrating your individual findings into a more cohesive in vivo experiment. Using GRAB-NE to measure extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately, cytosolic NA levels. Connecting these experiments would significantly strengthen the manuscript and enhance its overall impact. 

      It may be true that the measurement of extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately cytosolic NA levels. However, the reviewer #3 is still misunderstanding the applicability of GRAB-NE method to detect NE in our study. As described in the original authors’ response, there appeared to be no fluorescence probe to label cytosolic NA at present. Especially, the GRAB-NE method recommended by the reviewers #1 and #3 is limited to detect NA only when α2A-AR is expressed in the cell membrane.Therefore, when increases in cytosolic NA levels are caused by internalization of α2A-ARs, NA measurement with GRAB-NE photometry is not applicable.

      (2) Pharmacology and NE concentration  

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects. 

      Authors response: It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      While the milestone papers by Williams remain highly influential, they should be re-evaluated in light of more recent findings, given that they date back over 40 years. Advances in our understanding now allow for a more nuanced interpretation of some of their results. For example, see McKinney et al. (eLife, 2023). This study demonstrates that presynaptic β-adrenergic receptors-particularly β2-can enhance neuronal excitability via autocrine mechanisms. This suggests that your post-activation experiments using atipamezole may not fully exclude a contribution of β-adrenergic signaling. Such a role might become apparent when conducting more detailed titration experiments.

      The reviewer #3 may be misunderstanding the report by McKinney et al. (eLife, 2013). This paper did not demonstrate that presynaptic β-adrenergic receptors-particularly β2- can enhance neuronal excitability via autocrine mechanisms. It is impossible for LC neurons to increase their excitability by activating β-adrenergic receptors, as we have clearly shown that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole. Considering the difference in Ki values of atipamezole for α2-AR (= 2~4 nM) (Vacher et al., 2010, J Med Chem) and β-AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), such a complete antagonization (of 100 µM NA-induced GIRK-I) by 10 µM atipamezole really reflect α2A-AR activity but not β-AR activity (Figure S5). Furthermore, it is already well established that NA-induced GIRK-I was mediated by α2-AR activity in LC neurons (Arima et al., 1998, J Physiol). McKinney et al. (eLife, 2023) have just found the absence of lateral inhibition on adjacent LC neurons by NA autocrine caused respective spike activity. This has nothing to do with autoinhibition.

      (4) Age mismatch and disease claims 

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope. 

      Authors response: As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (R1, page 14, lines 448-450).

      It would be great to see this experiment performed in aged mice-you are the one who has everything in place to do it right now! 

      In our future separate studies, we would like to prove that the present mechanism is valid in aged mice, to validate its involvement in the pathogenesis of AD. This is partly because the patch-clamp study in aged mice is extremely difficult and takes much time.

      Authors response: In the abstract, you suggest that internalization of α2A-adrenergic receptors could represent a therapeutic target for Alzheimer's disease. "...Thus, it is likely that internalization of α2A-AR increased cytosolic NA, as reflected in AEP increases, by facilitating reuptake of autocrine-released NA. The suppression of α2A-AR internalization may have a translational potential for AD treatment."

      α2A-AR internalization was involved in the degeneration of LC neurons. Because we confirmed that spike-frequency adaptation reflecting α2A-AR-mediated autoinhibition can be induced in adult mice as prominently as in juvenile mice (Figure S10), it is not inadequate to suggest that the suppression of α2A-AR internalization may have a translational potential for anxiety/AD treatment (see Discussion; R2, page 14, lines 445-449).

      (6) Quantitative histology  

      Figure 5 presents attractive images, but no numerical analysis is provided. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots. 

      Author response: We have moved the immunohistochemical results in Fig. 5 to the supplement, as we believe the quantification of immunohistochemical staining is not necessarily correct.   

      What do you mean by that " ...immunohistochemical staining is not necessarily correct."  

      It is evident that in terms of quantification, Western blot analysis is a more accurate method than immunohistochemical staining. In this sense, it is the contention of our study that the ROI-based fluorescence quantification of immunohistochemical staining is not necessarily an accurate or correct procedure, compared to the quantification by Western blot analysis.

    1. eLife Assessment

      The analysis of neural morphology across Heliconiini butterfly species revealed brain area-specific changes associated with new foraging behaviours. While the volume of the centre for learning and memory, the mushroom bodies, was known to vary widely across species, new, valuable results show conservation of the volume of a center for navigation, the central complex. The presented evidence is convincing for both volumetric conservation in the central complex and fine neuroanatomical differences associated with pollen feeding, delivered by experimental approaches that are applicable to other insect species. This work will be of interest to evolutionary biologists, entomologists, and neuroscientists.

    2. Reviewer #1 (Public review):

      The authors previously reported that Heliconius, one genus of the Heliconiini butterflies, evolved to be efficient foragers to feed pollen of specific plants and have massively expanded mushroom bodies. Using the same image dataset, the authors segmented the central complex and associated brain regions and found that the volume of the central complex relative to the rest of the brain is largely conserved across the Heliconiini butterflies. By performing immunostaining to label a specific subset of neurons, the authors found several potential sites of evolutionary divergence in the central complex neural circuits, including the number of GABAergic ellipsoid body ring neurons and the innervation patterns of Allatostatin A expressing neurons in the noduli. These neuroanatomical data will be helpful to guide future studies to understand the evolution of the neural circuits for vector-based navigation.

      Strengths:

      The authors used a sufficiently large scale of dataset from 307 individuals of 41 species of Heliconiini butterflies to solidify the quantitative conclusions and present new microscopy data for fine neuroanatomical comparison of the central complex.

      Weaknesses:

      (1) Although the figures display a concise summary of anatomical findings, it would be difficult for non-experts to learn from this manuscript to identify the same neuronal processes in the raw confocal stacks. It would be helpful to have instructive movies to show a step-by-step guide for identification of neurons of interest, segmentations, and 3D visualizations (rotation) for several examples, including ER neurons (to supplement texts in line 347-353) and Allatostatin A neurons.

      (2) Related to (1), it was difficult for me to assess if the data in Figure 7 support the author's conclusions that ER neuron number increased in Heliconius Melpomene. By my understanding, the resolution of this dataset isn't high enough to trace individual axons and therefore authors do not rule out that the portion of "ER ring neurons" in Heliconius may not innervate the ER, as stated in Line 635 "Importantly, we also found that some ER neurons bypass the ellipsoid body and give rise to dense branches within distinct layers in the fan-shaped body (ER-FB)". If they don't innervate the ellipsoid body, why are they named as "ER neurons"?

      (3) Discussions around the lines 577-584 require the assumption that each ellipsoid body (EB) ring neuron typically arborises in a single microglomerulus to form a largely one-to-one connection with TuBu neurons within the bulb (BU), and therefore, the number of BU microglomeruli should provide an estimation of the number of ER neurons. Explain this key assumption or provide an alternative explanation.

      (4) The details of antibody information are missing in the Key resource table. Instead of citing papers, list the catalogue numbers and identifier for commercially available antibodies, and describe the antigen, and whether they are monoclonal or polyclonal. Are antigens conserved across species?

      (5) I did not understand why authors assume that foraging to feed on pollens is a more difficult cognitive task than foraging to feed on nectar. Would it be possible that they are equally demanding tasks, but pollen feeding allows Heliconius to pass more proteins and nucleic acids to their offspring and therefore they can develop larger mushroom bodies?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Farnsworth et al. ask whether the previously established expansion of mushroom bodies in the pollen foraging Heliconius genus of Heliconiini butterflies co-evolved with adaptations in the central complex. Heliconius trap line foraging strategies to acquire pollen as a novel resource require advanced spatial memory mediated by larger mushroom bodies, but the authors show that related navigation circuits in the central complex are highly conserved across the Heliconiini tribe, with a few interesting exceptions. Using general immunohistochemical stains and 3D reconstruction, the authors compared volumes of central complex regions, and unlike the mushroom bodies, there was no evidence of expansion associated with pollen feeding. However, a second dataset of neuromodulator and neuropeptide antibody labeling reveals more subtle differences between pollen and non-pollen foragers and highlights sub-circuits that may mediate species-specific differences in behavior. Specifically, the authors found an expansion of GABAergic ER neurons projecting to the fan-shaped body in Heliconius, which may enhance their ability to path-integrate. They also found differences in Allatostatin A immunoreactivity, particularly increased expression in the noduli associated with pollen feeding. These differences warrant closer examination in future studies to determine their functional implication on navigation and foraging behaviors.

      Strengths:

      The authors leveraged a large morphological data set from the Heliconiini to achieve excellent phylogenetic coverage across the tribe with 41 species represented. Their high-quality histology resolves anatomical details to the level of specific, identifiable tracts and cell body clusters. They revealed differences at a circuit level, which would not be obvious from a volumetric comparison. The discussion of these adaptations in the context of central complex models is useful for generating new hypotheses for future studies on the function of ER-FB neurons and the role of Allatostatin A modulation in navigation.

      The conclusions drawn in this paper are measured and supported by rigorous statistics and evidence from micrographs.

      Weaknesses:

      The majority of results in this study do not reveal adaptations in the central complex associated with pollen foraging. However, reporting conserved traits is useful and illustrates where developmental or functional constraints may be acting. The implied hypothesis in the introduction is that expansion of mushroom bodies in Heliconius co-evolved with central complex adaptations, so it may be helpful to set up the alternate hypotheses in the beginning.

      In the main text, the authors describe differences in GABAergic neurons "across several species" but only one Heliconius and one outgroup species seem to be represented in the figures. ER numbers in Figure 7H are only compared for these two species. If this data is available for other species, it would strengthen the paper to add them to the analysis, since this was one of the most intriguing findings in the study. I would want to know if the increased ER number is a trend in Heliconius or specific to H. melpomene.

    4. Author response:

      We thank the two reviewers for their constructive criticisms which we will address in the coming weeks, and we are confident doing so will benefit the manuscript.

      We will aim to address all comments, but there are two main areas in particular that we highlight here:

      (1)  Both reviewers make important suggestions to improve the readers’ understanding of the anatomical complexities and raw files we provide. We will generate annotated confocal stacks and simplify the nomenclature to better guide the reader through the more complex details of the anatomy of the central complex, and the neuron types we characterized more closely.

      (2)  Both reviewers also pointed to several parts of our interpretations and discussion that should be clarified. We will do so by improving the language we use at certain sections to offer more precision, and by offering alternative explanations where possible.

    1. eLife Assessment

      This study offers a valuable theoretical framework for quantifying molecular transport across interfaces between coexisting liquid phases, emphasizing interfacial resistance as a central factor governing transport kinetics. The mathematical derivations are solid. To enhance the paper's relevance and broaden its appeal, it would be helpful to clarify how the key equations connect to existing literature and to elucidate the physical mechanisms underlying scenarios that give rise to substantial interfacial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial.

      Strengths:

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible.

      Weaknesses:

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported.

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2.

    3. Reviewer #2 (Public review):

      Summary:

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies.

      Strengths:

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits.

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios.

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques.

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems.

      Weaknesses:

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability.

    4. Reviewer #3 (Public review):

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance.

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended.

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics).

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n^{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n^{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m^2).

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial. 

      Strengths: 

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible. 

      Weaknesses: 

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported. 

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent. 

      We agree and will make the overlap explicit, acknowledging priority and clarifying what is new here. The initial version of the preprint of Zhang et al. (2022) (https://www.biorxiv.org/content/10.1101/2022.03.16.484641v1) lacked the derivation (it referenced a Supplementary Note not yet available); it was added during the eLife submission. We worked from the preprint and missed this update, which we will now correct.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima. 

      We respectfully disagree with the reviewer on the physical plausibility of these scenarios there is both concrete experimental and theoretical evidence for the scenarios we discussed.

      Experimental: Strom et al. (2017) (our reference 11) describes a substantially reduced protein diffusion coefficient at an in vivo phase boundary, while Hahn et al. (2011a) and Hahn et al. (2011b) (our references 27 and 28) describe transient accumulation of molecules at a phase boundary, which they attribute to the Donnan potential, but conceivably a lowered mobility could play a role.

      Theoretical: Recent work (e.g., Majee et al. (2024)) shows that charged layers could form at phase boundaries, which could either repel or attract incoming molecules, depending on their charge, thus altering the local volume fraction, resulting in a trough or peak. Arguably, the model put forth by Zhang et al. (2024) could be mapped to a potential wall, where particles are reflected, unless in a certain state. We will add sentences to the corresponding results section, as well as the discussion to make this plausibility more apparent.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2. 

      We believe we will be able to address both issues.

      Reviewer #2 (Public review): 

      Summary: 

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies. 

      Strengths: 

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits. 

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios. 

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques. 

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems. 

      Weaknesses: 

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data. 

      We agree with the reviewer, an experimental manuscript is in progress.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      We thank the reviewer for this comment. We will add several such scenarios to the discussion, including the possibility to use interface resistance as a way of ordering biochemical reactions in time, as well as their potential to exclude molecules from condensates for long time periods, which, while not effective in the long-time limit, could help on cellular timescales of minutes to hours to respond to transient events.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability. 

      The treatment of labelled and unlabelled molecules as physically identical is well supported by our experiments. Droplets under typical experimental conditions, i.e. when bleaching is not too strong, do not markedly change size or volume fraction of molecules, which would be expected if the physical properties like molecular volume or interaction strength were significantly changed. However, we do agree that in more extreme bleaching regimes the bleach step itself will change the droplet properties, but this can be avoided by tuning the FRAP laser power and dwell times accordingly.

      Our diffusivity profiles are chosen in the simplest possible way to handle typical experimental constraints (large D outside, lower D inside, potentially lowered D at the boundary) and allow for a mean-field treatment. To the best of our knowledge, the precise make-up and concentration profiles of phase boundaries in biomolecular condensates are not currently known, due to limitations in optical resolution.

      Reviewer #3 (Public review): 

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance. 

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended. 

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics). 

      We thank the reviewer for this comment and will discuss possible mechanisms and how they map to our meanfield model in more detail, both in the corresponding results section, and in the discussion, as also outlined in our response to Reviewer #1.

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n<sup>^</sup>{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n<sup>^</sup>{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m<sup>^</sup>2). 

      We thank the reviewer. We identified that the error originates in the inline definition of the exchange chemical potential, already before equation 11. We inadvertently dropped a prefactor of n, which then shows up in the following equation as an exponent to (1-phi<sup>^</sup>*). Very importantly this means the main result eq. 25 still holds, and in the revised manuscript we will correct the ensuing typographical mistakes.

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      We will substantially expand the Appendix for the numerical solutions and add an explanatory file to the repository to make clear how the code can be run, as well as its dependencies.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability. 

      We have shown in Bo et al. (2021) that the labelling approach can be carried over to multi-component systems. Each species may, for example, encounter its own interface resistance. We will discuss this in more detail in the revised manuscript.

    1. eLife Assessment

      This work uses enhanced sampling molecular dynamics methods to generate potentially useful information about a conformational change (the DFG flip) that plays a key role in regulating kinase function and inhibitor binding. The focus of the work is on the mechanism of conformational change and how mutations affect the transition. The evidence supporting the conclusions is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used weighted ensemble enhanced sampling molecular dynamics (MD) to test the hypothesis that a double mutant of Abl favors the DFG-in state relative to the WT and therefore causes the drug resistance to imatinib.

      Strengths:

      The authors employed the state-of-the-art weighted ensemble MD simulations with three novel progress coordinates to explore the conformational changes the DFG motif of Abl kinase. The hypothesis regarding the double mutant's drug resistance is novel.

      Weaknesses:

      The study contains many uncertain aspects. A major revision is needed to strengthen the support for the conclusions.

      (1) Specifically, the authors need to define the DFG conformation using criteria accepted in the field, for example, see https://klifs.net/index.php.

      (2) Convergence needs to be demonstrated for estimating the population difference between different conformational states.

      (3) The DFG flip needs to be sampled several times to establish free energy difference.

      (4) The free energy plots do not appear to show an intermediate state as claimed.

      (5) The trajectory length of 7 ns in both Figure 2 and Figure 4 needs to be verified, as it is extremely short for a DFG flip that has a high free energy barrier.

      (6) The free energy scale (100 kT) appears to be one order of magnitude too large.

      (7) Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated.

      (8) Finally, the authors should discuss their work in the context of the enormous progress made in theoretical studies and mechanistic understanding of the conformational landscape of protein kinases in the last two decades, particularly with regard to the DFG flip.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript on the mechanism of the DFG flip in kinases. This conformational change is important for the toggling of kinases between active (DFG-in) and inactive (DFG-out) states. The relative probabilities of these two states are also an important determinant of the affinity of inhibitors for a kinase. However, it is an extremely slow/rare conformational change, making it difficult to capture in simulations. The authors show that weighted ensemble simulations can capture the DFG flip and then delve into the mechanism of this conformational change and the effects of mutations.

      Strengths:

      The DFG flip is very hard to capture in simulations. Showing that this can be done with relatively little simulation by using enhanced sampling is a valuable contribution. The manuscript gives a nice description of the background for non-experts.

      Weaknesses:

      I was disappointed by the anecdotal approach to presenting the results. Molecular processes are stochastic and the authors have expertise in describing such processes. However, they chose to put most statistical analysis in the SI. The main text instead describes the order of events in single "representative" trajectories. The main text makes it sound like these were most selected as they were continuous trajectories from the weighted ensemble simulations. I would much rather hear a description of the highest probability pathway(s) with some quantification of how probable they are. That would give the reader a clear sense of how representative the events described are.

      I appreciated the discussion of the strengths/weaknesses of weighted ensemble simulations. Am I correct that this method doesn't do anything to explicitly enhance sampling along orthogonal degrees of freedom? Maybe a point worth mentioning if so.

      I don't understand Figure 3C. Could the authors instead show structures corresponding to each of the states in 3B, and maybe also a representative structure for pathways 1 and 2?

      Why introduce S1 and DFG-inter? And why suppose that DFG-inter is what corresponds to the excited state seen by NMR?

      It would be nice to have error bars on the populations reported in Figure 3.

      I'm confused by the attempt to relate the relative probabilities of states to the 32 kca/mol barrier previously reported between the states. The barrier height should be related to the probability of a transition. The DFG-out state could be equiprobable with the DFG-in state and still have a 32 kcal/mol barrier separating them.

      How do the relative probabilities of the DFG-in/out states compare to experiments, like NMR?

      Do the staggered and concerted DFG flip pathways mentioned correspond to pathways 1 and 2 in Figure 3B, or is that a concept from previous literature?

    1. eLife Assessment

      This valuable work advances our understanding of the relation between multimodal MRI, cognition, and mental health. Convincing use of statistical learning techniques in UK Biobank data shows that 48% of the variance between an 11-task derived g-factor and imaging data can be explained. Overall, this paper contributes to the study of brain-behaviour relations and will be of interest for both its methods and its findings on how much variance in g can be explained.

      [Editorial note: a previous version was reviewed by Biological Psychiatry]

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

    4. Author response:

      Notes to Editors

      We previously received comments from three reviewers at Biological Psychiatry, which we have addressed in detail below. The following is a summary of the reviewers’ comments along with our responses.

      Reviewers 1 and 2 sought clearer justification for studying the cognition-mental health overlap (covariation) and its neuroimaging correlates. In the revised manuscripts, we expanded the Introduction and Discussion to explicitly outline the theoretical implications of investigating this overlap with machine learning. We also added nuance to the interpretation of the observed associations.

      Reviewer 1 raised concerns about the accessibility of the machine learning methodology for readers without expertise in this field. We revised the Methods section to provide a clearer, step-by-step explanation of our machine learning approach, particularly the two-level machine learning through stacking. We also enhanced the description of the overall machine learning design, including model training, validation, and testing.

      In response to Reviewer 2’s request for deeper interpretation of our findings and stronger theoretical grounding, we have expanded our discussion by incorporating a thorough interpretation of how mental health indices relate to cognition, material that was previously included only in supplementary materials due to word limit constraints. We have further strengthened the theoretical justification for our study design, with particular emphasis on the importance of examining shared variance between cognition and mental health through the derivation of neural markers of cognition. Additionally, to enhance the biological interpretation of our results, we included new analyses of feature importance across neuroimaging modalities, providing clearer insights into which neural features contribute most to the observed relationships.

      Notably, Reviewer 3 acknowledged the strength of our study, including multimodal design, robust analytical approach, and clear visualization and interpretation of results. Their comments were exclusively methodological, underscoring the manuscript’s quality.

      Reviewer 1:

      The authors try to bridge mental health characteristics, global cognition and various MRI-derived (structural, diffusion and resting state fMRI) measures using the large dataset of UK Biobank. Each MRI modality alone explained max 25% of the cognitionmental health covariance, and when combined together 48% of the variance could be explained. As a peer-reviewer not familiar with the used methods (machine learning, although familiar with imaging), the manuscript is hard to read and I wonder what the message for the field might be. In the end of the discussion the authors state '... we provide potential targets for behavioural and physiological interventions that may affect cognition', the real relevance (and impact) of the findings is unclear to me.

      Thank you for your thorough review and practical recommendations. We appreciate your constructive comments and suggestions and hope our revisions adequately address your concerns.

      Major questions

      (1) The methods are hard to follow for people not in this specific subfield, and therefore, I expect that for readers it is hard to understand how valid and how useful the approach is.

      Thank you for your comment. To enhance accessibility for readers without a machine learning background, we revised the Methods section to clarify our analyses while retaining important technical details needed to understand our approach. Recognizing that some concepts may require prior knowledge, we provide detailed explanations of each analysis step, including the machine learning pipeline in the Supplementary Methods.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g_factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (_r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) If only 40% of the cognition-mental health covariation can be explained by the MRI variables, how to explain the other 60% of the variance? And related to this %: why do the author think that 'this provides us confidence in using MRI to derive quantitative neuromarkers of cognition'?

      Thank you for this insightful observation. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health. The remaining 52% of unexplained variance may arise from several sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank.

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the Research Domain Criteria (RDoC) framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. We have now incorporated these considerations into the Discussion section.

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Regarding our confidence in using MRI to derive neural markers for cognition, we base this on the predictive performance of MRI-based models. As we note in the Discussion (Line 554: “Consistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-size performance (r ≈ 0.4) [15–17, 28, 61, 67, 68].”), the medium effect size we observed (r ≈ 0.4) agrees with existing literature on brain-cognition relationships, confirming that machine learning leads to replicable results. This effect size represents a moderate yet meaningful association in neuroimaging studies of aging, consistent with reports linking brain to behaviour in adults (Krämer et al., 2024; Tetereva et al., 2022). For example, a recent meta-analysis by Vieira and colleagues (2022) reported a similar effect size (r = 0.42, 95% CI [0.35;0.50]). Our study includes over 15000 participants, comparable to or more than typical meta-analyses, allowing us to characterise our work as a “mega-analysis”. And on top of this predictive performance, we found our neural markers for cognition to capture half of the cognition-mental health covariation, boosting our confidence in our approach.

      Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, et al. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience. 2024;46:283–308.

      Tetereva A, Li J, Deng JD, Stringaris A, Pat N. Capturing brain cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage. 2022;263:119588.

      (3) Imagine that we can increase the explained variance using multimodal MRI measures, why is it useful? What does it learn us? What might be the implications?

      We assume that by variance, Reviewer 1 referred to the cognition-mental health covariation mentioned in point 2) above.

      If we can increase the explained cognition-mental health covariation using multimodal MRI measures, it would mean that we have developed a reasonable neuromarker that is close to RDoC’s neurobiological unit of analysis for cognition. RDoC treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. This means RDoC aims to discover neural markers of cognition that explain the covariation between cognition and mental health. For us, we approach the development of such neural markers using multimodal neuroimaging. We have now explained the motivation of our study in the first paragraph of the Introduction.

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      More specific issues:

      Introduction

      (4) In the intro the sentence 'in some cases, altered cognitive functioning is directly related to psychiatric symptom severity' is in contrast to the next sentence '... are often stable and persist upon alleviation of psychiatric symptoms'.

      Thank you for pointing this out. The first sentence refers to cases where cognitive deficits fluctuate with symptom severity, while the second emphasizes that core cognitive impairments often remain stable even during symptom remission. To avoid this confusion, we have removed these sentences.

      (5) In the intro the text on the methods (various MRI modalities) is not needed for the Biol Psych readers audience.

      We appreciate your comment. While some members of our target audience may have backgrounds in neuroimaging, machine learning, or psychiatry, we recognize that not all readers will be familiar with all three areas. To ensure accessibility for those who are not familiar with neuroimaging, we included a brief overview of the MRI modalities and quantification methods used in our study to provide context for the specific neuroimaging phenotypes. Additionally, we provided background information on the machine learning techniques employed, so that readers without a strong background in machine learning can still follow our methodology.

      (6) Regarding age of the study sample: I understand that at recruitment the subjects' age ranges from 40 to 69 years. At MRI scanning the age ranges between about 46 to 82. How is that possible? And related to the age of the population: how did the authors deal with age in the analyses, since age is affecting both cognition as the brain measures?

      Thank you for noticing this. In the Methods section, we first outline the characteristics of the UK Biobank cohort, including the age at first recruitment (40-69 years). Table 1 then shows the characteristics of participant subsamples included in each analysis. Since our study used data from Instance 2 (the second in-person visit), participants were approximately 5-13 years older at scanning, resulting in the age range of 46 to 82 years. We clarified the Table 1 caption as follows:

      Line 113: “Table 1. Demographics for each subsample analysed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and MRI scanning”

      We acknowledge that age may influence cognitive and neuroimaging measures. In our analyses, we intentionally preserved age-related variance in brain-cognition relationships across mid and late adulthood, as regressing out age completely would artificially remove biologically meaningful associations. At the same time, we rigorously addressed the effects of age and sex through additional commonality analyses quantifying age and sex contributions to the relationship between cognition and mental health.

      As noted by Reviewer 1 and illustrated in Figure 8, age and sex shared substantial overlapping variance with both mental health and neuroimaging phenotypes in explaining cognitive outcomes. For example, in Figure 8i, age and sex together accounted for 43% of the variance in the cognition-mental health relationship:

      (2.76 + 1.03) / (2.76 + 1.03 + 3.52 + 1.45) ≈ 0.43

      Furthermore, neuromarkers from the all-MRI stacked model explained 72% of this age/sexrelated variance:

      2.76 / (2.76 + 1.03) ≈ 0.72

      This indicates that our neuromarkers captured a substantial portion of the cognition-mental health covariation that varied with age and sex, highlighting their relevance in age/sex-sensitive cognitive modeling.

      In the Methods, Results, and Discussion, we say:

      Methods

      Line 263: “To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age2, age×sex, and age2×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      Line 445: “Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (“All MRI Stacked”) explained 72% of this age and sex-related variance (Fig. 8i–l and Table S21).”

      Discussion

      Line 660: “We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.”

      (7) Regarding the mental health variables: where characteristics with positive value (e.g. happiness and subjective wellbeing) reversely scored (compared to the negative items, such as anxiety, addition, etc)?

      We appreciate you noting this. These composite scores primarily represent standard clinical measures such as the GAD-7 anxiety scale and N-12 neuroticism scale. We did not reverse the scores to keep their directionality, therefore making interpretability consistent with the original studies the scores were derived from (e.g., Davis et al., 2020; Dutt et al., 2022). Complete descriptive statistics for all mental health indices and detailed derivation procedures are provided in the Supplementary Materials (S2). On Page 6, Supplementary Methods, we say:

      Line 92: “Composite mental health scores included the Generalized Anxiety Disorder (GAD-7), the Posttraumatic Stress Disorder (PTSD) Checklist (PCL-6), the Alcohol Use Disorders Identification Test (AUDIT), the Patient Health Questionnaire (PHQ-9) [12], the Eysenck Neuroticism (N-12), Probable Depression Status (PDS), and the Recent Depressive Symptoms (RDS-4) scores [13, 14]. To calculate the GAD-7, PCL-6, AUDIT, and PHQ-9, we used questions introduced at the online follow-up [12]. To obtain the N-12, PDS, and RDS-4 scores [14], we used data collected during the baseline assessment [13, 14].

      We subcategorized depression and GAD based on frequency, current status (ever had depression or anxiety and current status of depression or anxiety), severity, and clinical diagnosis (depression or anxiety confirmed by a healthcare practitioner). Additionally, we differentiated between different depression statuses, such as recurrent depression, depression triggered by loss, etc. Variables related to self-harm were subdivided based on whether a person has ever self-harmed with the intent to die.

      To make response scales more intuitive, we recorded responses within the well-being domain such that the lower score corresponded to a lesser extent of satisfaction (“Extremely unhappy”) and the higher score indicated a higher level of happiness (“Extremely happy”). For all questions, we assigned the median values to “Prefer not to answer” (-818 for in-person assessment and -3 for online questionnaire) and “Do not know” (-121 for in-person assessment and -1 for online questionnaire) responses. We excluded the “Work/job satisfaction” question from the mental health derivatives list because it included a “Not employed” response option, which could not be reasonably coded.

      To calculate the risk of PTSD, we used questions from the PCL-6 questionnaire. Following Davis and colleagues [12], PCL-6 scores ranged from 6 to 29. A PCL-6 score of 12 or below corresponds to a low risk of meeting the Clinician-Administered PTSD Scale diagnostic criteria. PCL-6 scores between 13 and 16 and between 17 and 25 are indicative of an increased risk and high risk of PTSD, respectively. A score of above 26 is interpreted as a very high risk of PTSD [12, 15]. PTSD status was set to positive if the PCL-6 score exceeded or was equal to 14 and encompassed stressful events instead of catastrophic trauma alone [12].

      To assess alcohol consumption, alcohol dependence, and harm associated with drinking, we calculated the sum of the ten questions from the AUDIT questionnaire [16]. We additionally subdivided the AUDIT score into the alcohol consumption score (questions 1-3, AUDIT-C) and the score reflecting problems caused by alcohol (questions 4-10, AUDIT-P) [17]. In questions 2-10 that followed the first trigger question (“Frequency of drinking alcohol”), we replaced missing values with 0 as they would correspond to a “Never” response to the first question.

      An AUDIT score cut-off of 8 suggests moderate or low-risk alcohol consumption, and scores of 8 to 15 and above 15 indicate severe/harmful and hazardous (alcohol dependence or moderate-severe alcohol use disorder) drinking, respectively [16, 18]. Subsequently, hazardous alcohol use and alcohol dependence status correspond to AUDIT scores of ≥ 8 and ≥ 15, respectively. The “Alcohol dependence ever” status was set to positive if a participant had ever been physically dependent on alcohol. To reduce skewness, we logx+1-transformed the AUDIT, AUDIT-C, and AUDIT-P scores [17].”

      Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank – development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.

      Dutt RK, Hannon K, Easley TO, Griffis JC, Zhang W, Bijsterbosch JD. Mental health in the UK Biobank: A roadmap to selfreport measures and neuroimaging correlates. Hum Brain Mapp. 2022;43:816–832.  

      (8) In the discussion section (page 23, line 416-421), the authors refer to specific findings that are not described in the results section > I would add these findings to the main manuscript (including the discussion / interpretation).

      We appreciate your careful reading. We agree that our original Results section did not explicitly describe the factor loadings for mental health in the PLSR model, despite discussing their implications later in the paper. We needed to include this part of the discussion in the Supplementary Materials to meet the word limit of the original submission. However, in response to your suggestion, we have now added the results regarding factor loadings to the Results section. We also moved the discussion of the association between mental health features and general cognition from the Supplementary Material to the manuscript’s Discussion.

      Results

      Line 298: “On average, information about mental health predicted the g-factor at  R<sup>2</sup><sub>mean</sub> = 0.10 and r<sub>mean</sub> \= 0.31 (95% CI [0.291, 0.315]; Fig. 2b and 2c and Supplementary Materials, S9, Table S12). The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition.”

      Discussion

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (9) In the discussion section (page 24, line 440-449), the authors give an explanation on why the diffusion measure have limited utility, but the arguments put forward also concern structural and rsfMRI measures.

      Thank you for this important observation. Indeed, the argument about voxel-averaged diffusion components (“… these metrics are less specific to the properties of individual white matter axons or bundles, and instead represent a composite of multiple diffusion components averaged within a voxel and across major fibre pathways”) could theoretically apply across other MRI modalities. We have therefore removed this point from the discussion to avoid overgeneralization. However, we maintain our central argument about the biological specificity of conventional tractography-derived diffusion metrics as their particular sensitivity to white matter microstructure (e.g., axonal integrity, myelin content) may make them better suited for detecting neuropathological changes than dynamic cognitive processes. This interpretation aligns with the mixed evidence linking these metrics to cognitive performance, despite their established utility in detecting white matter abnormalities in clinical populations (e.g., Bergamino et al., 2021; Silk et al., 2009). We clarify this distinction in the manuscript.

      Line 572: “The somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations such as Alzheimer’s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) [117–119], the empirical evidence on their associations with cognitive performance is controversial [114, 120–126].”

      Bergamino M, Walsh RR, Stokes AM. Free-water diffusion tensor imaging improves the accuracy and sensitivity of white matter analysis in Alzheimer’s disease. Sci Rep. 2021;11:6990.

      Silk TJ, Vance A, Rinehart N, Bradshaw JL, Cunnington R. White-matter abnormalities in attention deficit hyperactivity disorder: a diffusion tensor imaging study. Hum Brain Mapp. 2009;30:2757–2765.

      Reviewer 2:

      This is an interesting study combining a lot of data to investigate the link between cognition and mental health. The description of the study is very clear, it's easy to read for someone like me who does not have a lot of expertise in machine learning.

      We thank you for your thorough review and constructive feedback. Your insightful comments have helped us identify conceptual and methodological aspects that required improvement in the manuscript. We have incorporated relevant changes throughout the paper, and below, we address each of your points in detail.

      Comment 1: My main concern with this manuscript is that it is not yet clear to me what it exactly means to look at the overlap between cognition and mental health. This relation is r=0.3 which is not that high, so why is it then necessary to explain this overlap with neuroimaging measures? And, could it be that the relation between cognition and mental health is explained by third variables (environment? opportunities?). In the introduction I miss an explanation of why it is important to study this and what it will tell us, and in the discussion I would like to read some kind of 'answer' to these questions.

      Thank you. It’s important to clarify why we investigated the relationship between cognition and mental health, and what we found using data from the UK Biobank.

      Conceptually, our work is grounded in the Research Domain Criteria (RDoC; Insel et al., 2010) framework. RDoC conceptualizes mental health not through traditional diagnostic categories, but through core functional domains that span the full spectrum from normal to abnormal functioning. These domains include cognition, negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Within this framework, cognition is considered a fundamental domain that contributes to mental health across diagnostic boundaries. Meta-analytic evidence supports a link between cognitive functioning and mental health (Abramovitch, et al., 2021; East-Richard, et al., 2020). In the context of a large, population-based dataset like the UK Biobank, this implies that cognitive performance – as measured by various cognitive tasks – should be meaningfully associated with available mental health indicators.

      However, because cognition is only one of several functional domains implicated in mental health, we do not expect the covariation between cognition and mental health to be very high. Other domains, such as negative and positive valence systems, arousal and regulatory systems, or social processing, may also play significant roles. Theoretically, this places an upper bound on the strength of the cognition-mental health relationship, especially in normative, nonclinical samples.

      Our current findings from the UK Biobank reflect this. Most of the 133 mental health variables showed relatively weak individual correlations with cognition (mean r \= 0.01, SD = 0.05, min r \= –0.08, max r \= 0.17; see Figure 2). However, using a PLS-based machine learning approach, we were able to integrate information across all mental-health variables to predict cognition, yielding an out-of-sample correlation of r = 0.31 [95% CI: 0.29, 0.32].  

      We believe this estimate approximates the true strength of the cognition-mental health relationship in normative samples, consistent with both theoretical expectations and prior empirical findings. Theoretically, this aligns with the RDoC view that cognition is one of several contributing domains. Empirically, our results are consistent with findings from our previous mega-analysis in children (Wang et al., 2025). Moreover, in the field of gerontology, an effect size of r = 0.31 is not considered small. According to Brydges (2019), it falls around the 70th percentile of effect sizes reported in gerontological studies and approaches the threshold for a large effect (r \= 0.32). Given that most studies report within-sample associations, our out-of-sample results are likely more robust and generalizable (Yarkoni & Westfall, 2017).

      To answer, “why is it then necessary to explain this overlap with neuroimaging measures”, we again draw on the conceptual foundation of the RDoC framework. RDoC emphasizes that each functional domain, such as cognition, should be studied not only at the behavioural level but also across multiple neurobiological units of analysis, including genes, molecules, cells, circuits, physiology, and behaviour.

      MRI-based neural markers represent one such level of analysis. While other biological systems (e.g., genetic, molecular, or physiological) also contribute to the cognition-mental health relationship, neuroimaging provides unique insights into the brain mechanisms underlying this association – insights that cannot be obtained from behavioural data alone.

      In response to the related question, “Could the relationship between cognition and mental health be explained by third variables (e.g., environment, opportunities)?”, we note that developing a neural marker of cognition capable of capturing its relationship with mental health is the central aim of this study. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health.

      The remaining 52% of unexplained variance may stem from several sources. According to the RDoC framework, neuromarkers could be further refined by incorporating additional neuroimaging modalities (e.g., task-based fMRI, PET, ASL, MEG/EEG, fNIRS) and integrating other units of analysis such as genetic, molecular, cellular, and physiological data.

      Once more comprehensive neuromarkers are developed, capturing a greater proportion of the cognition-mental health covariation, they may also lead to new research direction – to investigate how environmental factors and life opportunities influence these markers. However, exploring those environmental contributions lies beyond the scope of the current study.

      We discuss these considerations and explain the motivation of our study in the revised Introduction and Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Introduction

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      Discussion

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007.

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017;12(6):1100-1122.

      Comment 2 Title: - Shouldn't it be "MRI markers" (plural)?

      We used the singular form (“marker”) intentionally, as it refers to the composite neuroimaging marker derived from all three MRI modalities in our stacked model. This multimodal marker represents the combined predictive power of all modalities and captures the highest proportion of the mental health-cognition relationship in our analyses.

      Comment 3: Introduction - I miss an explanation of why it is useful to look at cognition-mental health covariation

      We believe we have sufficiently addressed this comment in our response to Reviewer 2, comment 1 above.

      Comment 4: - "Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health" (page 4, line 56-58) - how/why?

      Previous research has largely focused on developing MRI-based neural indicators that accurately predict cognitive performance (Marek et al., 2022; Vieira et al., 2020). Building on this foundation, our findings further demonstrate that the predictive performance of a neural indicator for cognition is closely tied to its ability to explain the covariation between cognition and mental health. In other words, the robustness of a neural indicator – its capacity to capture individual differences in cognition – is strongly associated with how well it reflects the shared variance between cognition and mental health.

      This insight is particularly important within the context of the RDoC framework, which seeks to understand the etiology of mental health through functional domains (such as cognition) and their underlying neurobiological units of analysis (Insel et al., 2010). According to RDoC, for a neural indicator of cognition to be informative for mental health research, it must not only predict cognitive performance but also capture its relationship with mental health.

      Furthermore, RDoC emphasizes the integration of neurobiological measures to investigate the influence of environmental and developmental factors on mental health. In line with this, our neural indicators of cognition may serve as valuable tools in future research aimed at understanding how environmental exposures and developmental trajectories shape mental health outcomes. We discuss this in more detail in the revised Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–660.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Comment 5: - The explanation about the stacking approach is not yet completely clear to me. I don't understand how the target variable can be the dependent variable in both step one and two. Or are those different variables? It would be helpful to also give an example of the target variable in line 88 on page 5

      Thank you for this excellent question. In our stacking approach, the same target variable, the g-factor, is indeed used across both modeling stages, but with a key distinction in how predictions are generated and integrated.

      In the first-level models, we trained separate Partial Least Squares Regression (PLSR) models for each of the 72 neuroimaging phenotypes, each predicting the g-factor independently. The predicted values from these 72 models were then used as input features for the second-level stacked model, which combined them to generate a final prediction of the g-factor. This twostage framework enables us to integrate information across multiple imaging modalities while maintaining a consistent prediction target.

      To avoid data leakage, both modeling stages were conducted entirely within the training set for each cross-validation fold. Only after the second-level model was trained was it applied to the outer-fold test participants who were not involved in any part of the model training process.

      To improve accessibility, we have revised the Methods section (see Page 10) to clarify this approach, ensuring that the description remains technically accurate while being easier to follow.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed gfactor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      Comment 6: Methods - It's not clear from the text and Figure 1 which 12 scores from 11 tests are being used to derive the g-factor. Figure 1 shows only 8 bullet points with 10 scores in A and 13 tests under 'Cognitive tests' in B. Moreover, Supplement S1 describes 12 tests and 14 measures (Prospective Memory test is in the text but not in Supplementary Table 1).

      Thank you for identifying this discrepancy. In the original Figure 1b and in the Supplementary Methods (S1), the “Prospective Memory” test was accidentally duplicated, while it was present in the Supplementary Table 1 (Line 53, Supplementary Table 1). We have now corrected both figures for consistency. To clarify: Figure 1a presents the global mental health and cognitive domains studied, while Figure 1b now accurately lists 1) the 12 cognitive scores from 11 tests used to derive the g-factor (with the Trail Making Test contributing two measures – numeric and alphabetic trails) and 2) the three main categories of mental health indices used as machine learning features.

      We also corrected the Supplementary Materials to remove the duplicate test from the first paragraph. In Supplementary Table 1, there were 11 tests listed, and for the Trail Making test, we specified in the “Core measures” column that this test had 2 derivative scores: duration to complete the numeric path (Trail 1) and duration to complete the alphabetic path (Trail 2).

      Supplementary Materials, Line 46: “We used twelve scores from the eleven cognitive tests that represented the following cognitive domains: reaction time and processing speed (Reaction Time test), working memory (Numeric Memory test), verbal and numerical reasoning (Fluid Intelligence test), executive function (Trail Making Test), non-verbal fluid reasoning (Matrix Pattern Completion test), processing speed (Symbol Digit Substitution test), vocabulary (Picture Vocabulary test), planning abilities (Tower Rearranging test), verbal declarative memory (Paired Associate Learning test), prospective memory (Prospective Memory test), and visual memory (Pairs Matching test) [1].”

      Comment 7: - For the mental health measures: If I understand correctly, the questionnaire items were used individually, but also to create composite scores. This seems counterintuitive, because I would assume that if the raw data is used, the composite scores would not add additional information to that. When reading the Supplement, it seems like I'm not correct… It would be helpful to clarify the text on page 7 in the main text.

      You raise an excellent observation regarding the use of both individual questionnaire items and composite scores. This dual approach was methodologically justified by the properties of Partial Least Squares Regression (PLSR), our chosen first-level machine learning algorithm, which benefits from rich feature sets and can handle multicollinearity through dimensionality reduction. PLSR transforms correlated features into latent variables, meaning both individual items and composite scores can contribute unique information to the model. We elaborate on PLSR's mathematical principles in Supplementary Materials (S5).

      To directly address this concern, we conducted comparative analyses showing that the PLSR model (a single 80/20% training/test split), incorporating all 133 mental health features (both items and composites), outperformed models using either type alone. The full model achieved superior performance (MSE = 0.458, MAE = 0.537, \= 0.112, Pearson r = 0.336, p-value = 6.936e-112) compared to using only composite scores (93 features; MSE = 0.461, MAE = 0.538, R<sup>2</sup> = 0.107, Pearson r = 0.328, p-value = 5.8e-106) or only questionnaire items (40 features; MSE = 0.499, MAE = 0.561, R<sup>2</sup> = 0.033, Pearson r = 0.184, p-value = 2.53e-33). These results confirm that including both data types provide complementary predictive value. We expand on these considerations in the revised Methods section.

      Line 123: “Mental health measures encompassed 133 variables from twelve groups: mental distress, depression, clinical diagnoses related to the nervous system and mental health, mania (including bipolar disorder), neuroticism, anxiety, addictions, alcohol and cannabis use, unusual/psychotic experiences, traumatic events, selfharm behaviours, and happiness and subjective well-being (Fig. 1 and Tables S4 and S5). We included both selfreport questionnaire items from all participants and composite diagnostic scores computed following Davis et al. and Dutt et al. [35,36] as features in our first-level (for explanation, see Data analysis section) Partial Least Squares Regression (PLSR) model. This approach leverages PLSR’s ability to handle multicollinearity through dimensionality reduction, enabling simultaneous use of granular symptom-level information and robust composite measures (for mental health scoring details, see Supplementary Materials, S2). We assess the contribution of each mental health index to general cognition by examining the direction and magnitude of its PLSR-derived loadings on the identified latent variables”

      Comment 8: - Results - The colors in Figure 4 B are a bit hard to differentiate.

      We have updated Figure 4 to enhance colour differentiation by adjusting saturation and brightness levels, improving visual distinction. For further clarity, we split the original figure into two separate figures.

      Comment 9: - Discussion - "Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition," - this seems counterintuitive, that some symptoms relate to better cognition and others relate to worse cognition. Could you elaborate on this finding and what it could mean?

      We appreciate you highlighting this important observation. While some associations between mental health indices and cognition may appear counterintuitive at first glance, these patterns are robust (emerging consistently across both univariate correlations and PLSR loadings) and align with previous literature (e.g., Karpinski et al., 2018; Ogueji et al., 2022). For instance, the positive relationship between cognitive ability and certain mental health indicators like help-seeking behaviour has been documented in other population studies (Karpinski et al., 2018; Ogueji et al., 2022), potentially reflecting greater health literacy and access to care among cognitively advantaged individuals. Conversely, the negative associations with conditions like psychotic experiences mirror established neurocognitive deficits in these domains.

      As was initially detailed in Supplementary Materials (S12) and now expanded in our Discussion, these findings likely reflect complex multidimensional interactions. The positive loadings for mental distress indicators may capture: (1) greater help-seeking behaviour among those with higher cognition and socioeconomic resources, and/or (2) psychological overexcitability and rumination tendencies in high-functioning individuals. These interpretations are particularly relevant to the UK Biobank's assessment methods, where mental distress items focused on medical help-seeking rather than symptom severity per se (e.g., as a measure of mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress).

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      Karpinski RI, Kinase Kolb AM, Tetreault NA, Borowski TB. High intelligence: A risk factor for psychological and physiological overexcitabilities. Intelligence. 2018;66:8–23.

      Ogueji IA, Okoloba MM. Seeking Professional Help for Mental Illness: A Mixed-Methods Study of Black Family Members in the UK and Nigeria. Psychol Stud. 2022;67:164–177.

      Comment 10: - All neuroimaging factors together explain 48% of the variance in the cognition-mental health relationship. However, this relationship is only r=0.3 - so then the effect of neuroimaging factors seems a lot smaller… What does it mean?

      Thank you for raising this critical point. We have addressed this point in our response to Reviewer 1, comment 2, Reviewer 1, comment 3 and Reviewer 2, comment 1.

      Briefly, cognition is related to mental health at around r = 0.3 and to neuroimaging phenotypes at around r = 0.4. These levels of relationship strength are consistent to what has been shown in the literature (e.g., Wang et al., 2025 and Vieira et al., 2020). We discussed the relationship between cognition and mental health in our response to Reviewer 2, comment 1 above. In short, this relationship reflects just one functional domain – mental health may also be associated with other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Moreover, in the context of gerontology research, this effect size is considered relatively large (Brydges et al., 2019).

      We conducted a commonality analysis to investigate the unique and shared variance of mental health and neuroimaging phenotypes in explaining cognition.  As we discussed in our response to Reviewer 1, comment 2, we were able to account for 48% of the covariation between cognition and mental health using the MRI modalities available in the UK Biobank. The remaining 52% of unexplained variance may arise from several sources.

      One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Tetereva et al., 2025).

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      We have now incorporated these considerations into the Discussion section.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Tetereva A, Knodt AR, Melzer TR, et al. Improving Predictability, Reliability and Generalisability of Brain-Wide Associations for Cognitive Abilities via Multimodal Stacking. Preprint. bioRxiv. 2025;2024.05.03.589404.

      Reviewer 3:

      Buianova et al. present a comprehensive analysis examining the predictive value of multimodal neuroimaging data for general cognitive ability, operationalized as a derived g-factor. The study demonstrates that functional MRI holds the strongest predictive power among the modalities, while integrating multiple MRI modalities through stacking further enhances prediction performance. The inclusion of a commonality analysis provides valuable insight into the extent to which shared and unique variance across mental health features and neuroimaging modalities contributes to the observed associations with cognition. The results are clearly presented and supported by highquality visualizations. Limitations of the sample are stated clearly.

      Thank you once more for your constructive and encouraging feedback. We appreciate your careful reading and valuable methodological insights. Your expertise has helped us clarify key methodological concepts and improve the overall rigour of our study.

      Suggestions for improvement:

      (1) The manuscript would benefit from the inclusion of permutation testing to evaluate the statistical significance of the predictive models. This is particularly important given that some of the reported performance metrics are relatively modest, and permutation testing could help ensure that results are not driven by chance.

      Thank you, this is an excellent point. We agree that evaluating the statistical significance of our predictive models is essential.

      In our original analysis, we assessed model performance by generating a bootstrap distribution of Pearson’s r, resampling the data with replacement 5,000 times (see Figure 3b). In response to your feedback, we have made the following updates:

      (1) Improved Figure 3b to explicitly display the 95% confidence intervals.

      (2) Supplemented the results by reporting the exact confidence interval values.

      (3) Clarified our significance testing procedure in the Methods section.

      We considered model performance statistically significant when the 95% confidence interval did not include zero, indicating that the observed associations are unlikely to have occurred by chance.

      We chose bootstrapping over permutation testing because, while both can assess statistical significance, bootstrapping additionally provides uncertainty estimates in the form of confidence intervals. Given the large sample size in our study, significance testing can be less informative, as even small effects may reach statistical significance. Bootstrapping offers a more nuanced understanding of model uncertainty.

      Line 233: “To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g-factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) Applying and testing the trained models on an external validation set would increase confidence in generalisability of the model.

      We appreciate this excellent suggestion. While we considered this approach, implementing it would require identifying an appropriate external dataset with comparable neuroimaging and behavioural measures, along with careful matching of acquisition protocols and variable definitions across sites. These challenges extend beyond the scope of the current study, though we fully agree that this represents an important direction for future research.

      Our findings, obtained from one of the largest neuroimaging datasets to date with training and test samples exceeding most previous studies, align closely with existing literature: the predictive accuracy of each neuroimaging phenotype and modality for cognition matches the effect size reported in meta-analyses (r ≈ 0.4; e.g., Vieira et al., 2020). The ability of dwMRI, rsMRI and sMRI to capture the cognition-mental health relationship is, in turn, consistent with our previous work in pediatric populations (Wang et al., 2025; Pat et al., 2022).

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Pat N, Wang Y, Anney R, Riglin L, Thapar A, Stringaris A. Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Hum Brain Mapp. 2022;43:5520–5542.

      (3) The rationale for selecting a 5-by-10-fold cross-validation scheme is not clearly explained. Clarifying why this structure was preferred over more commonly used alternatives, such as 10-by-10 or 5-by-5 cross-validation, would strengthen the methodological transparency.

      Thank you for this important methodological question. Our choice of a 5-by-10-fold crossvalidation scheme was motivated by the need to balance robust hyperparameter tuning with computational efficiency, particularly memory and processing time. Retaining five outer folds allowed us to rigorously assess model performance across multiple data partitions, leading to an outer-fold test set at least n = 4 000 and providing a substantial amount of neuroimaging data involved in model training. In contrast, employing ten inner folds ensured robust and stable hyperparameter tuning that maximizes the reliability of model selection. Thus, the 5-outer-fold with our large sample provided sufficient out-of-sample test set size for reliable model evaluation and efficient computation, while 10 inner folds enabled robust hyperparameter tuning. We now provide additional rationale for this design decision on Page 10.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.”

      (4) A more detailed discussion of which specific brain regions or features within each neuroimaging modality contributed most strongly to the prediction of cognition would enhance neurobiological relevance of the findings.

      Thank you for this thoughtful suggestion. To address this point, we have included feature importance plots for the top-performing neuroimaging phenotypes within each modality (Figure 5 and Figures S2–S4), demonstrating the relative contributions of individual features to the predictive models. While we maintain our primary focus on cross-modality performance comparisons in the main text, as this aligns with our central aim of evaluating multimodal MRI markers at the integrated level, we outline the contribution of neuroimaging features with the highest predictive performance for cognition in the revised Results and Discussion.

      Methods

      Line 255: “To determine which neuroimaging features contribute most to the predictive performance of topperforming phenotypes within each modality, while accounting for the potential latent components derived from neuroimaging, we assessed feature importance using the Haufe transformation [62]. Specifically, we calculated Pearson correlations between the predicted g-factor and scaled and centred neuroimaging features across five outer-fold test sets. We also examined whether the performance of neuroimaging phenotypes in predicting cognition per se is related to their ability to explain the link between cognition and mental health. Here, we computed the correlation between the predictive performance of each neuroimaging phenotype and the proportion of the cognition-mental health relationship it captures. To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age<sup>2</sup>, age×sex, and age<sup>2</sup>×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      dwMRI

      Line 331: “Overall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Fig. 3). TBSS, in turn, performed better than probabilistic tractography (Fig. 3 and Table S13). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.052, r<sub>mean</sub> = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in aparc MSA-I parcellation with the predicted g_factor values from the PLSR model. Positive associations with the predicted _g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Fig. 5 and Supplementary Fig. S2).”

      rsMRI

      Line 353: “Among RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Fig. 5 and Supplementary Fig. S3 and S4)”

      sMRI

      Line 373: “FreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.068, r<sub>mean</sub> = 0.244, 95% CI [0.237, 0.259] and R<sup>2</sup><sub>mean</sub> = 0.059, r<sub>mean</sub> = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of bilateral whole hippocampal head and whole hippocampus (Fig. 5 and Supplementary Fig. S5 for feature importance maps). Grey matter morphological characteristics from ex vivo Brodmann Area Maps showed the lowest predictive performance (R<sup>2</sup><sub>mean</sub> = 0.008, r<sub>mean</sub> = 0.089, 95% CI [0.075, 0.098]; Fig. 3 and Table S15).”

      Discussion

      dwMRI

      Line 562: “Among dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis.”

      rsMRI

      Line 583: “We extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature [15, 28], by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research [127–130].”

      sMRI

      Line 608: “Integrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults [138–140].”

      (5) The formatting of some figure legends could be improved for clarity - for example, some subheadings were not formatted in bold (e.g., Figure 2 c)

      Thank you for noticing this. We have updated the figures to enhance clarity, keeping subheadings plain while bolding figure numbers and MRI modality names.

    1. eLife Assessment

      This valuable paper investigates how fish avoid thermal disturbances that occur on fast timescales. The authors use a creative experimental approach that quickly creates a vertical thermal interface, which they combine with careful behavioral analyses. The evidence supporting their results is solid, but there is a potential confounding factor between temperature and vertical positioning, and characterization of the thermal interface would greatly assist in interpreting the results.

    2. Reviewer #1 (Public review):

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations.

    3. Reviewer #2 (Public review):

      The paper by Naudascher et al., investigates an interesting question: How do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales. Previous work has identified potential strategies of warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While direct measurements of the interface are lacking, thermal dynamics simulations suggest that trout parr avoid the warm-cold interface in the absence of gradient information.

      The authors assume that the thermal interface triggers the upward turning behavior, possibly leading to the formation of an associative memory. However, an alternative explanation is that exposure to cold water during initial excursions increases the tendency for upward turns. In other words, exposure to a cold interface changes the behavioral state leading to increases in gravity controlled upward turning. This could be an adaptive strategy since for temperatures > 4C swimming upwards is a good strategy to reach warmer water. That being said, the vertical design offers new insight and is ecologically relevant.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult.

      We highly appreciate this evaluation and have addressed the reviewer’s specific comments below.

      The sentence "Further, the metabolic performance (and thus functions including growth, reproduction, and locomotion) of ectotherms takes the form of a bell-shaped curve as a function of temperature6, peaking within a range of optimal temperatures (the 'preferendum') and going to zero at lower and upper temperature limits7." contains several over-simplifications and misconceptions:

      (1) Thermal performance curves are never bell-shaped.

      (2) The optimum for various traits often shows different TPCs.

      (3) The preferendum rarely lines up with the thermal optimum for various trait TPCs.

      (4) Performance for various traits rarely reaches zero at upper or lower limits, instead they can reach zero at less extreme temperatures (e.g. growth) or maintain high function all the way up to and sometimes beyond thermal limits (e.g. aerobic scope, heart rate).

      We highly appreciate this input. We have replaced that sentence with: L69-71: “Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance (Jutfelt et al. 2024).”

      The use of adaptation instead of acclimation is confusing. Adaptation should be reserved for evolutionary change. This is an issue in several parts of the manuscript.

      Thanks for this input, we have replaced the word adapt with acclimate in two instances: L79 and L398.

      It is not true that "very few quantitative studies of thermotaxis have been conducted in fish". There exists an extensive literature on thermal preference and avoidance in fish that the manuscript downplays.

      Thanks a lot for this input. We understand that thermal preference is ultimately driven by mechanistic responses to thermal gradients, and that thermotaxis and thermokinesis are the two mechanisms used by fish to navigate heterothermal environments. Our study and analysis are focused on understanding these mechanisms in vertically stratified conditions, not to understand thermal preferences per se. We have modified our text to clarify this aspect. Our literature review was focused on the behavioral mechanisms and our understanding is that the establishment of thermal preferences has a different goal compared to understanding how fish respond to rapid changes in water temperature. We have deleted that sentence and replaced it by (L107-110): “While the thermal preference of fish is a well-established field of research, very few quantitative studies of the behavioral mechanisms allowing fish to seek their preferendum (i.e. thermotaxis) have been conducted in fish.”

      (Methods) It is unclear why the blue dye was used in all experiments. The fish can see the differently coloured water layer and that may have affected their choices. Five control trials without dye were run but finding no difference there could also be due to low statistical power.

      We appreciate this comment. The blue dye was used to visualize the precise location of the thermal interface and was therefore necessary in all experiments (see Methods section ‘Visualization and evolution of the thermal interface’). We acknowledge that fish can perceive the colored water layer, but since the dye concentration and resulting color intensity were consistent across all treatments, we do not see how it could have acted as a confounding variable. While we recognize the possibility of some behavioral influence from the dye, the clear behavioral differences across treatments indicate that it was not a determining factor. To emphasize this we have added the following to the manuscript (L701-703): “Furthermore, because the dye concentration and resulting color intensity were consistent across all treatments, the dye did not act as a confounding variable in our statistical comparisons.”

      Regarding statistical power, our control experiment without dye (N = 16 fish, 4 replicates; see Fig. S34 and S35) provides sufficient statistical power to assess whether the dye influenced behavior. The reviewer indicated that the high statistical power was a strength of the paper, which aligns with our view that our study design enables robust statistical comparisons. It seems contradictory that statistical power is a concern for the control trials, given that our main experiments were conducted with a similar sample size. Indeed, the number of replicates used is consistent with similar studies and balances statistical rigor with the ethical goal of reducing the number of animals used in experimentation. To emphasize this, we have added the following to the manuscript (L865-868): “The number of replicates used in this study reflects a balance between statistical rigor and the ethical imperative to minimize the use of animals in experimentation. Regarding statistical power, our design (five replicates with groups of four fish each) is consistent with similar studies and represents an adequate sample size.”

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult. This issue should be thoroughly discussed.

      Thank you very much for this comment. We revised the manuscript accordingly, to clearly indicate that our goal was to assess the response of fish to vertically thermally stratified water, a scenario that occurs frequently in nature. We have added the following paragraph the discussion (L523-530): “However, a generalization of our observations to horizontally oriented thermal gradients remains elusive. Our results are inherently tied to the vertical stratification created in our experiments. As warm water was always positioned above and cold water below, we could not control for the effect of vertical position (i.e., we could not do cold over warm layer experiments). This limits our ability to directly compare our findings to those obtained from horizontally oriented thermal gradients. On the other hand, the case we addressed is of direct environmental relevance, as natural waters often experience vertical thermal stratification.”

      It is unclear why the authors assume an "optimal temperature" (undefined for which trait) of 12°C for brown trout parr, and why they assume the preference temperature would match that "optimal" temperature. The thermal biology for any fish species is more complex than a single perfect temperature, with various traits showing differing optima and often a mismatch with the preferred temperature. The literature suggests brown trout growth optima between 13 and 16°C, and preference temperature has even been suggested to be as high as 21°C. In light of this, the authors' conclusion that brown trout avoid cold and don't avoid warm water is possibly misguided. It is possible that the brown trout had a preference temperature higher than 12°C, which should be acknowledged and discussed.

      This is indeed a very important aspect, which was partly (but indeed not fully) already addressed in the discussion. To reflect these considerations, we have expanded the existing paragraph in the discussion (additions are in yellow). (L422 - L439): “We conclude from the behavior of fish when warmer water was available that their acute thermal preferendum exceeded 12 °C, departing from the acclimation temperature we had chosen based on the thermal preferendum for trout reported in literature[33]. Indeed, the thermal biology for any fish species is more complex than a single, static thermal preferendum: Many internal and external factors, such as hypoxia, satiation, time of day, and life stage[5], can influence the temperature preference of fish. For example, the level of satiation can have an impact because when fish are well fed, their growth rate increases with body temperature as metabolic performance increases[40]. This modifies the preferred temperature, as observed in Bear Lake sculpin (Cottus extensus) that ascend into warmer water after feeding to stimulate digestion and thereby achieve a three-fold higher growth rate[41]. In contrast, field studies with adult fish have observed movement from warm to cold water in summer[42,43], allowing fish to lower their metabolic rate, likely in effort to conserve energy[2,44]. We propose that the behavior of trout parr upon exposure to warmer water in our experiments served to achieve a higher body temperature to ultimately increase growth rate, which is critical for this life stage[45,46]. Indeed, growth experiments on brown trout populations have shown that optimal growth temperatures can range between 15 and 19 °C, depending on the stream of origin[46].”

      The figures are unnecessarily complex and introduce a long list of abbreviations and Greek characters for no apparent reason. There are many simpler ways for showing the results so unclear why they are so opaque.

      We appreciate the reviewer’s feedback and agree on the importance of clarity, however (in the absence of specific suggestions) we did not make changes to the figures or the use of Greek characters (which align with convention), as we believe they effectively convey the results. We highlight that the data themselves are very rich (multiple fish, multiple phases, multiple treatments, etc.) and we wanted to convey this richness in a compact and transparent manner.

      Reviewer #2:

      This paper investigates an interesting question: how do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales? Previous work has identified potential strategies for warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While I found the paper interesting and convincing overall, there are a few omissions and choices in the presentation that limit interpretability and clarity.

      A main question concerns the thermal interface itself. The authors track this interface using a blue dye that is mixed in with either colder or warmer water before a gate is opened that leads to gravitational flow overlaying the two water temperatures. The dye likely allows to identify convective currents which could lead to rapid mixing of water temperatures. However, it is less clear whether it accurately reflects thermal diffusion. This is problematic as the authors identify upward turning behavior around the interface which appears to be the behavioral strategy for avoiding cold water but not warm water. Without knowing the extent of the gradient across the interface, it is hard to know what the fish are sensing. The authors appear to treat the interface as essentially static, leading them to the conclusion that turning away before the interface is reached is likely related to associative learning. However, thermal diffusion could very likely create a gradient across centimeters which is used as a cue by the fish to initiate the turn. In an ideal world, the authors would use a thermal camera to track the relationship between temperature and the dye interface. Absent that, the simulation that is mentioned in passing in the methods section should be discussed in detail in the main text, and results should be displayed in Figure 1. Error metrics on the parameters used in the simulation could then be used to identify turns in subsequent figures that likely are or aren't affected by a gradient formed across the interface.

      The authors assume that the thermal interface triggers the upward-turning behavior. However, an alternative explanation, which should be discussed, is that cold water increases the tendency for upward turns. This could be an adaptive strategy since for temperatures > 4C turning swimming upwards is likely a good strategy to reach warmer water.

      The paper currently also suffers from a lack of clarity which is largely created by figure organization. Four main and 38 supplemental figures are very unusual. I give some specific recommendations below but the authors should decide which data is truly supplemental, versus supporting important points made in the paper itself. There also appear to be supplemental figures that are never referenced in the text which makes traversing the supplements unnecessarily tedious.

      The N that was used as the basis for statistical tests and plots should be identified in the figures to improve interpretability. To improve rigor, the experimental procedures should be expanded.

      Specifically, the paper uses two thermal models which are not detailed at all in the methods section.

      We appreciate these crucial comments to our paper. We have addressed these points in detail below.

      As stated above, a characterization of the thermal interface is critical. Ideally via measurement or at least by expanding on the simulation.

      We appreciate the idea of using thermal cameras and, indeed, we had initially tried to use them. However, thermal cameras generally cannot see through plexiglass or glass-like material due to the way infrared radiation interacts with these materials. While thin plastics can transmit some infrared, thicker plastics and reflective materials like glass tend to block or reflect infrared light.

      We have attempted to better characterize the thermal interface thickness, namely the spatial extent of the thermal gradient over the time period of our experiments (20 min). Indeed, our simulations in the original SI were conducted precisely to estimate the thermal interface thickness, though based on thermal diffusion in still water, while turbulence generated by the moving gravity current can smear out the interface, particularly in the initial phase. To account for this in our in the reviewed manuscript, we adopted a phenomenological approach to estimate the initial increase in thickness of the thermal interface due to turbulence and present this refined simulation in our manuscript.

      Our analysis suggests that, rather than assuming an initial interface thickness of zero (as in the original version of the manuscript), the thermal diffusion simulations should begin with an initial thickness of 2.8 mm in TR1. To incorporate this adjustment, we set the initial interface thickness to 2.8 mm and ran the simulation forward for t = 20 min, assuming diffusion. This approach resulted in a final interface thickness ranging between 4 and 6 cm (see Fig. 29 in the Supplementary Information).

      To reflect this refinement, we have added a new paragraph (L717-758: "Characterization of the thermal gradient", to the Methods section. Additionally, we have updated Fig. S29 in the Supplementary Information and included an average (over time and across treatments) gradient thickness of 5 cm in Figs. 2 and 3 of the manuscript. The revised Figs. 2 and 3 now explicitly indicate the estimated vertical extent of the thermal gradient, with an extended caption detailing these changes.

      The simulation should be detailed in the methods so that its validity can be evaluated and ideally, it should involve curved interfaces as encountered in the experiment.

      To account for the effect of turbulence during the initial, inertia-dominated phase after the gate removal, we have provided a correction for the initial thickness of the interface (see the addition to the Methods section). Thank you for your suggestion regarding the incorporation of curved interfaces in the simulations. We believe that including curved interfaces in the simulations would not significantly affect the results. As shown in the manuscript, the interface is curved primarily during the initial phase of the process (first 2 min where the flow is inertia-dominated), which is currently not included in our data analysis (phase 1 begins 2 min after the gate removal).

      In that vein, distances from the interface rather than height above the interface should be reported for the fish.

      We acknowledge the reviewer’s suggestion to report distances from the interface rather than height above or below it. However, beyond the initial phase, we do not see a strong justification for using the orthogonal distance over the vertical distance, as the choice is inherently arbitrary (e.g., one could also measure the distance along the fish’s orientation vector). We have therefore kept our assessment based on the vertical distance.

      Absent measurements, the paragraph on associative learning should be struck from the discussion as it is purely speculative.

      We agree that the original paragraph on associative learning may have sounded overly speculative. However, after updating our manuscript with additional simulations of the thermal gradient's vertical extent, we found that fish perform upward turns not only above the thermal interface, but also before entering the thermal gradient itself. This observation makes us hesitant to attribute the response solely to thermotaxis. We believe it is essential to provide a plausible explanation—albeit speculative—for how fish initiate these turns before directly encountering the cold-water gradient. To support this, we have extended the discussion in this paragraph and added Supplementary Fig. 39. The new text now reads (additions in yellow): (L487 – 499): “Our findings show that fish were able to perform upward turns while still located above the thermal interface and that is, before actually sampling the cold water below the interface. In fact, our simulation of the vertical extent of the thermal gradient revealed that a substantial fraction of upward turns occurred before fish encountered the gradient itself — that is, prior to any sensory detection of the temperature change (Supplementary Fig. 39). This finding may be evidence of associative learning, whereby fish used information regarding the presence of colder water at depth obtained at prior times. While the current data do not provide conclusive evidence in this regard, they prompt the possibility that, rather than responding solely to immediate thermal cues, fish use spatial memory or associative learning to anticipate the location of colder water based on prior experience. Indeed, fish are able to perform associative learning based on non-visual cues[53], create mental maps of their surroundings54 and retain memory for hours[55], days[56] and months[57,58].”  

      The body-temperature simulations need to be detailed in the methods.

      Thanks for this comment. We have removed the supplementary text section and have included the paragraph “Body cooling during cold-water excursions” into the methods section of our manuscript (L804 - L829).

      Constant temperature experiments could be helpful in addressing the importance of a gradient/interface for triggering upward turning

      We agree, however, we were limited (for ethical reasons) to a maximum number of fish we could use in the experiments. Hence, we focused on getting approval to run experiments focused on the responses to thermal gradients. However, occupancy during the acclimation phase in 12 °C showed that fish were much more stationary and primarily occupied the lower half of the tank.

      A lot of ease of reading could be gained by labeling the conditions according to either the second temperature or perhaps even better the delta temperature (i.e. TR[-2C] instead of TR1).

      We agree that labeling conditions by the second temperature or delta temperature could in principle improve readability. However, since T_bottom and T_top are explicitly mentioned in each main figure at least once, they can be directly associated with the respective treatment. Therefore, we have opted to retain the current labeling for consistency.

      The figure legends are often short and do not accurately label all figure elements. This is especially true for supplemental figure legends which often appear rushed (e.g., the legend for Figure S2 stops mid-sentence, the legend of Figure S3 does not indicate what Ttop or Tbottom are).

      We appreciate the reviewer’s comment and have carefully revised all figure legends to ensure clarity and completeness. Specifically, we have corrected figure labels, expanded the descriptions for supplemental figures, and ensured that all elements are accurately defined. For instance, we have completed the legend for Figure S2 and clarified the definitions of T_top and T_bottom in Figure S3. Additionally, we have systematically reviewed all figure legends to prevent inconsistencies and omissions.

      For Figure S3, to improve clarity, plotting the standard deviation at different points in the tank across the phases could be more informative than the hard-to-distinguish multi-line plots in different shades of red.

      We appreciate the reviewer’s suggestion regarding Figure S3. However, the primary goal of this figure is to illustrate how the thermal interface moves over time. While plotting the standard deviation at different points in the tank could provide additional statistical insights, it would detract from the intended visualization of the interface dynamics. For this reason, we have opted to retain the current multi-line representation. Nevertheless, we have ensured that the figure is as clear as possible by refining the color contrast and improving the legend for better readability.

      There is an inconsistency in in-text citation styles (mixture of superscript and numbers in brackets).

      Thank you for pointing this out. We have carefully reviewed the manuscript and corrected any inconsistencies in the in-text citation style to ensure uniform formatting throughout.

      While the statement in the introduction, that increases in movement frequency could be purely metabolic in nature is correct, at least for larval zebrafish it has been shown that sensory neural activity is predictive of motor neuron activity and swim rates (Haesemeyer, 2018, cited by the authors).

      This is an interesting finding. It is however unclear to us why this information is crucial in our context of brown trout parr.

      Examples of summary results from Supplementary Figures 8-10 should be bundled in a main text figure since this appears to be important information supporting the conclusions.

      We agree that Supplementary Figures 8–10 contain important information (i.e. Boxplots) on vertical occupancy and the time individuals spent in different water temperatures. However, this information is already integrated into Figure 2C, D, F, and G, which display the vertical distributions of fish across treatments and over time. Given the current length of the manuscript, adding another main-text figure could dilute rather than enhance clarity. For this reason, we have opted to keep these details in the Supplementary Materials while ensuring they are appropriately referenced in the main text.

      The distributions of excursion length for all treatments should be graphed in a main figure to support the point made in the third paragraph of the "Trout parr... do not avoid warm water" section of the results.

      We appreciate the reviewer’s suggestion. However, we do not believe that plotting excursion length is necessary to support this statement, as the key finding is already well represented in the manuscript. Specifically, the transition to bimodal depth occupancy, with fish spending comparable time above and below the interface in warm-water treatments (TR6–TR9), is clearly conveyed in Figure 2F and Supplementary Figure 8B. Additionally, this information is explicitly stated in the results section (L235): "Fish did not avoid warmer water in any of the warm-water treatments (TR6–TR9). Instead, fish transitioned to a bimodal depth occupancy, with comparable time spent above and below the interface (Fig. 2F; Supplementary Fig. 8B)." Given this, we believe that adding an additional figure would not enhance clarity but may instead introduce redundancy.

      There should be a main figure panel that statistically compares the turn biases around the interface for the different conditions and the +/- 5cm interface line mentioned in the text should be visualized in the appropriate figures - incidentally, this length scale is on par with the diffusion seen in simulations further suggesting that fish in fact sense a gradient here rather than remembering an interface.

      To address the reviewer’s comment, we have made the following updates:

      • Extended and incorporated simulations of the thermal interface thickness (see Methods and Supplementary Fig. 29).

      • Plotted the vertical locations of up-turning events relative to the phase-averaged position of the thermal interface (see Supplementary Fig. 39), which includes the estimated 5 cm vertical extent of the thermal gradient.

      • Added the thermal interface thickness to the main figures (Fig. 3F,G and Fig. 2E,H) where applicable.

      While we do not claim that memory alone explains cold-water avoidance, our data still suggests that it may contribute to the observed behavior, particularly since a substantial number of upturns occurred before the fish entered the thermal gradient (see also Author response image 1 below). Our aim is not to statistically disentangle the relative contribution of thermotaxis versus associative learning, but to propose a plausible interpretation of this observed anticipatory behavior with due caution to clarify that this is only a possibility.

      Given that the thermal gradient is now visualized and characterized in detail, we respectfully suggest that an additional statistical comparison of turn biases would not add further clarity. We believe that is is evidence that vertical turning, away from the cold, occurred within and above the thermal gradient. However, we welcome the reviewer’s perspective and to demonstrate that turning points occur outside and above the thermal interface we have plotted them against gradient growth over time (see Author response image 1 below).

      Author response image 1.

      The colored area indicates the temporal growth of thermal interface thickness.

      Reviewer #3:

      In this study, the authors measured the behavioural responses of brown trout to the sudden availability of a choice between thermal environments. The data clearly show that these fish avoid colder temperatures than the acclimation condition, but generally have no preference between the acclimation condition or warmer water (though I think the speculation that the fish are slowly warming up is interesting). Further, the evidence is compelling that avoidance of cold water is a combination of thermotaxis and thermokinesis. This is a clever experimental approach and the results are novel, interesting, and have clear biological implications as the authors discuss. I also commend the team for an extremely robust, transparent, and clear explanation of the experimental design and analytical decisions. The supplemental material is very helpful for understanding many of the methodological nuances, though I admit that I found it overwhelming at times and wonder if it could be pruned slightly to increase readability. Overall, I think the conclusions are generally well-supported by the data, and I have no major concerns.

      Minor comments

      P2 intro paragraphs 1/3 - it is not clear that thermal preference generally reflects the thermal optimum, partly because it is not clear what trait is being optimized (fitness?). Some nuance here would be helpful, and would also link nicely to the discussion on p10.

      Thank you for this comment. We have now refined this section as follows (L67–71): "As most fish species are ectotherms, their body temperature fluctuates with the surrounding water temperature. Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance[6]."

      To further clarify how thermal preference relates to thermal optimum and what trait is being optimized, we have incorporated additional nuance in this section. Specifically, we now acknowledge that thermal preference may not always align with the thermal optimum for performance or fitness.

      P2 intro paragraph 2 - "adapt physiologically" implies evolution, but here you are referring to plasticity. Suggest saving the word "adapt/adaptation" for evolutionary changes (see also p9).

      Thank you for this comment. We have revised the wording to "acclimate physiologically" (L79) to more accurately reflect plastic responses rather than evolutionary adaptation.

      P7 - "This difference in probabilities (ρup - ρdown) was particularly large in the region immediately above and below the interface (-5 cm < D < 5 cm; Fig. 3F) and is a hallmark of a thermotactic behavior." I agree that the result provides compelling evidence for thermotaxis, but would it be possible to bolster this case by statistically testing for a difference in probabilities among the treatment groups here?

      In addition to Fig. 3F, we are presenting statistical evidence that for colder water temperatures, fish penetrate less deeply into the cold lower water. The decreasing trend was statistically significant (Mann–Kendall test: , p < 0.001; Supplementary Table 6) and is presented in Fig. 4C. The depth reached during each cold-water excursion is determined by the location of the vertical turning point, which redirects the fish upward toward the surface. We think this is sufficient evidence for thermotaxis.

      P9 paragraph 3 = "recent studies suggest that fish may instead respond to temporal changes of their internal body temperature." It seems like a citation is missing here. Would be useful to briefly summarize the evidence for internal temperature sensing that is the basis of this modelling exercise.

      Thanks, we have added that citation (L385).

      P10 "Our findings provide the first experimental evidence for this mode of behavioral thermoregulation in which fish navigate their heterothermal environment to achieve gradual body warming."

      I think this statement overreaches given the presented data. While there may be a trend towards fish in the warm treatment spending increasing amounts of time in the upper half of the tank, I do not see this pattern supported statistically. There is also no evidence of gradual body warming, and even if there was I disagree that this would constitute experimental evidence that this was happening "intentionally". By this reasoning, any shuttlebox experiment in which fish actively shuttle between relatively warm and cool sides to end up with a preference that is above the starting condition would also constitute evidence for gradual warming. Overall, this is an interesting pattern, but I do not think there is sufficient evidence to conclude that fish are strategically warming.

      We appreciate the reviewer’s comment and acknowledge that our original wording may have overstated the evidence. We have revised the sentence to better reflect the evdience presented (L411-415): “Our observations resemble this mode of behavioral thermoregulation, in which fish progressively favor warmer regions within a heterothermal environment. However, additional experimental evidence is required to determine the mechanisms underlying this behavior.”

      P11 "Despite the avoidance response of cold water, fish engaged in repeated cold-water excursions..."

      This is an interesting speculation, but I think it would be helpful to also point out that these fish are biased towards the bottom of the tank (based on control measurements) and this pattern may therefore simply reflect a desire to be lower in the water column.

      Thank you for this helpful comment. We have now added this point to the revised text, which reads (L475-477): “Despite the avoidance response to cold water, fish engaged in repeated cold-water excursions, potentially reflecting a behavioral strategy to map the thermal environment. This pattern may also reflect an inherent tendency to occupy the lower part of the tank, as observed during homogeneous temperature of 12 °C during the acclimation phase.”

      P13 - why was the dye always added to the right side of the tank, instead of being assigned to a side randomly? I think the control experiment is good evidence that the dye did not substantially affect behaviour, but it seems like it would have been nice to separate dye and novel temperature exposure.

      We agree that randomizing the side of dye application would have been ideal. The dye was consistently added to the right side to maintain procedural consistency, ensuring that the “incoming” or “novel” temperature was always dyed. That said, our control experiment provides strong evidence that the dye itself did not influence behavior (as discussed above and in the manuscript).

    1. eLife Assessment

      This important study uses the delay line axon model in the chick brainstem auditory circuit to examine the interactions between oligodendrocytes and axons in the formation of internodal distances. This is a significant and actively studied topic, and the authors have used this preparation to support the hypothesis that regional heterogeneity in oligodendrocytes underlies the observed variation in internodal length. In a solid series of experiments, the authors have used enhanced tetanus neurotoxin light chains, a genetically encoded silencing tool, to inhibit vesicular release from axons and support the hypothesis that regional heterogeneity among oligodendrocytes may underlie the biased nodal spacing pattern in the sound localization circuit.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #2 (Public review):

      Summary:

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major comments:

      (1) The authors should test the efficiency of TeNT to validate that vesicular release is indeed inhibited from expressing neurons. Additionally, the authors should clarify if their TeNT expression system results in the whole tract being silenced, or results in sparse vesicular release inhibition in only a few neurons.

      (2) The authors should revise their statistical analyses throughout, and supply additional information to explain the rationale for the statistical tests used, including e.g. data normality, paired sampling, number of samples/independent biological replicates.

      (3) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the avian auditory circuit?

      (4) The study shows a correlation between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). The authors should either include such experiments, or discuss their value in supporting the interpretation of their results.

      (5) The authors should discuss very pertinent prior studies, in particular to contextualize their findings with (a) known neuron-autonomous modes of node formation prior to myelination, (b) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, (c) known correlation of myelin length and thickness with axonal diameter, (d) regional heterogeneity in the oligodendrocyte transcriptome.

      Significance:

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing.

      Major comments:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      Significance:

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

      Comments on revised version:

      This revised version is in large improved and the responses to reviewers' comments are generally relevant. However, the response regarding pre-nodes is not satisfactory. I understand that the authors prefer to avoid further experimentations, but I think this is an important point that needs to be clarified. Exploring stages between E12 and E15 are therefore of importance. When carefully examining some of the figures (Fig. 1E or 2D) I think that at E15 they may well be pre-nodes formation prior to myelin deposition, on structure the authors considered to be heminodes. To be convincing they should use double or triple labeling with, in addition to the nodal proteins (ankG and/or Nav pan), a good myelin marker such as antiPLP. The rat monoclonal developed by late Pr Ikenaka would give a sharper staining than the anti MAG they used. (I assume the clone must still be available in Okazaki ).

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Evidence, reproducibility and clarity

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      Thanks for your suggestion to avoid confusion. We used the phrase "nodal spacing" instead of "nodal distribution" throughout the revised manuscript.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      As a key distinction, our study focuses specifically on the main trunk of the contralateral projection of NM axons. This projection features a sequential branching structure known as the delay line, where collateral branches form terminal arbors and connect to the ventral dendritic layer of NL neurons. This structural organization plays a critical role in influencing the dynamic range of ITD detection by regulating conduction delays along the NM axon trunk.

      The study by Seidl et al. (2010) is a pioneering work that measured diameter of NM axon using electron microscopy, providing highly reliable data. However, due to the technical  limitations of electron microscopy, which does not allow for the continuous tracing of individual axons, it is not entirely clear whether the axons measured in the ventral NL region correspond to terminal arbors of collateral branches or the main trunk of NM axons (see Figure 9E, F in their paper). Instead, they categorized axon diameters based on their distance from NL cell layer, showing that axon diameter increases distally (see Figure 9G in their paper). Notably, the diameters of ventral axons located more than 120 μm away from the NL cell layer is almost identical to those in the midline.

      As illustrated in our Figure 4D and Supplementary Video 2, the main trunk of the contralateral NM projection is predominantly located in these distal regions. Therefore, our findings complement those of Seidl et al. (2010) rather than contradicting them. We made this point as clear as possible in text (page 7, line 3).

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      In this study, we examined chick embryos from E9 to just before hatching (E21) and post-hatch chicks up to P9. Chickens begin to perceive sound around E12 and possess sound localization abilities at the time of hatching (Grier et al., 1967) (added to page 4, line 9). Therefore, by E21, the sound localization circuit is largely established.

      On the other hand, additional refinement of the circuit with aging is certainly possible. A key cue for sound localization, interaural time difference (ITD), depends on the distance between the two ears, which increases as the animal grows. As shown in Figure 2G, internodal length increased by approximately 20% between E18 and P9 while maintaining regional differences. Given that NM axons are nearly fully myelinated by E21 (Figure 4D, 6C), this suggests that myelin extends in proportion to the overall growth of the head and brain volume. We described this possibility in text (page 5, line 21)

      Thus, our study covers not only the early stages of myelination but also the post-functional maturation in the sound localization circuit.

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?

      In this study, we demonstrated that although vesicular release did not affect internodal length, it selectively promoted oligodendrogenesis, thereby supporting the full myelination and hence the pattern of nodal spacing along the NM axons. We believe that this finding falls within the broader scope of 'activity-dependent plasticity' involving oligodendrocytes and nodes.

      As summarized in the excellent review by Bonetto et al. (2021), activity-dependent plasticity in oligodendrocytes encompasses a wide range of phenomena, not limited to changes in internodal length but also including oligodendrogenesis. Moreover, the effects of neuronal activity are not uniform but likely depend on the diversity of both neurons and oligodendrocytes. For example, in the mouse visual cortex, activity-dependent myelination occurs in interneurons but not in excitatory neurons (Yang et al., 2020). Additionally, expression of TeNT in axons affected myelination heterogeneously in zebrafish; some axons were impaired in myelination and the others were not affected at all (Koudelka et al., 2016). In the mouse corpus callosum, neuronal activity influences oligodendrogenesis, which in turn facilitates adaptive myelination (Gibson et al., 2014).

      Thus, rather than refuting the role of activity-dependent plasticity in nodal spacing, our findings emphasize the diversity of underlying regulatory mechanisms. We described these explicitly in text (page 10, line 18).

      Significance

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

      This paper does not argue against node plasticity, but rather demonstrates that oligodendrocytes in the NL region exhibit a form of plasticity; they proliferate in response to vesicular release from NM axons, yet do not undergo morphological changes, ensuring adequate oligodendrocyte density for the full myelination of the auditory circuit. Thus, activity-dependent plasticity involving oligodendrocytes would contributes in various ways to each neural circuit, which is presumably attributed to the fact that myelination is driven by complex multicellular interactions between diverse axons and oligodendrocytes. Oligodendrocytes are known to exhibit heterogeneity in morphology, function, responsiveness, and gene profiles (Foerster et al., 2019; Sherafat et al., 2021; Osanai et al., 2022; Valihrach et al., 2022), but functional significance of this heterogeneity remains largely unclear. This paper also provides insight into how oligodendrocyte heterogeneity may contribute to the fine-tuning of neural circuit function, adding further value to our findings. Importantly, our study covers the wide range of development in the sound localization circuit, from the pre-myelination (E9) to the postfunctional maturation (P9), revealing how the nodal spacing pattern along the axon in this circuit emerges and matures.

      Reviewer #2:

      Evidence, reproducibility and clarity

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major points, detailed below, need to be addressed to overcome some limitations of the study.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      eTeNT is a widely used genetically encoded silencing tool and constructs similar to the one used in this study have been successfully applied in primates and rodents to suppress target behaviors via genetic dissection of specific pathways (Kinoshita et al., 2012; Sooksawate et al., 2013). However, precisely quantifying the extent of vesicular release inhibition from NM axons in the brainstem auditory circuit is technically problematic.

      One major limitation is that while A3V efficiently infects NM neurons, its transduction efficiency does not reach 100%. In electrophysiological evaluations, NL neurons receive inputs from multiple NM axons, meaning that responses may still include input from uninfected axons. Additionally, failure to evoke synaptic responses could either indicate successful silencing or failure to stimulate NM axons, making a clear distinction difficult. Furthermore, unlike in motor circuits, we cannot assess the effect of silencing by observing behavioral outputs.

      Thus, we instead opted to quantify the precise expression efficiency of GFP-tagged eTeNT in the cell bodies of NM neurons. The proportion of NM neurons expressing GFP-tagged eTeNT was 89.7 ± 1.6% (N = 6 chicks), which is consistent with previous reports evaluating A3V transduction efficiency in the brainstem auditory circuit (Matsui et al., 2012). These results strongly suggest that synaptic transmission from NM axons was globally silenced by eTeNT at the NL region. We described these explicitly in text (page 8, line 2).

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      Figure 6D depicts a representative axon selected from a dense population of GFP-positive axons in a 200-μm-thick slice after A3V-eTeNT infection to bilateral NM. As shown in Supplementary Video 1 and 2, densely labeled GFP-positive axons can be traced along the main trunk. To prevent any misinterpretation, we have revised the description of Figure 6 in the main text and Figure legend (page 31, line 9), and stated the A3V-eTeNT infection efficiency was 89.7 ± 1.6% in NM neurons, as mentioned above. Based on this efficiency, we interpreted that the global occlusion of vesicular release from most of the NM axons altered the pericellular microenvironment of the NL region, which led to the regional effect on the oligodendrocyte density.

      On the other hand, your question regarding whether sparse expression of eTeNT still has an effect is highly relevant. As we also discussed in our reply to comment 4 by Reviewer #1, the relationship between neuronal activity and oligodendrocytes is highly diverse. In some types of axons, vesicular release is essential for normal myelination, and this process was disrupted by TeNT (Koudelka et al., 2016), suggesting that direct interaction with oligodendrocytes via vesicle release may actively promote myelination in these types of axons.

      To clarify whether the phenotype observed in Figure 6 arises from changes in the pericellular microenvironment at the NL region or from the direct suppression of axon-oligodendrocyte interactions, we included a new Supplementary Figure (Figure 6—figure supplement 1). In this figure, we evaluated the node formation on the axon sparsely expressing eTeNT by electroporation into the unilateral NM. The results showed that sparse eTeNT expression did not increase the percentages of heminodes or unmyelinated segments. This finding supports our conclusion that the increased unmyelinated segments by A3V-eTeNT resulted from impaired synaptic transmission at NM terminals and subsequent alterations of  pericellular microenvironment at the NL region.

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:

      Thank you for your valuable suggestions to improve the rigor of our statistical analyses. We have reanalyzed all statistical tests using R software. In the revised Methods section and Figure Legends, we have clarified the rationale for selecting each statistical test, specified which test was used for each figure, and explicitly defined both n and N. After reevaluation with the Shapiro-Wilk test, we adjusted some analyses to non-parametric tests where appropriate. However, these adjustments did not alter the statistical significance of our results compared to the original analyses.

      (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or MannWhitney U test, which are much more common in the field. What is the rationale for the test choice?

      We have revised the explanation of our statistical test choices to provide greater clarity and precision. For example, in Figure 2G, we first assessed the normality of the data in each of the four groups using the Shapiro-Wilk test, which revealed that some datasets did not follow a normal distribution. Given this, we selected the Kruskal-Wallis test, a commonly used non-parametric test for comparisons across three or more groups. Since the Kruskal-Wallis test indicated a significant difference, we conducted a post hoc Steel-Dwass test to determine which specific group comparisons were statistically significant.

      (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a ttest used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.

      In the case of Figures 3H-K, we compared oligodendrocyte morphology between regions. However, since the number of sparsely labeled oligodendrocytes differs both between regions and across individual samples, there is no strict correspondence between paired measurements. On the other hand, in Figures 5B, C, and E, we compared the density of labeled cells between regions within the same slice, establishing a direct correspondence between paired data points. For these comparisons, we appropriately used a paired t-test.

      (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)

      We have specified the statistical tests used for each figure in the Methods section and Figure Legends for better clarity. Additionally, we have revised the descriptions for Figure 4G, L, and M and their corresponding Figure Legends to explicitly indicate that Spearman’s rank correlation coefficient (rₛ) was used for evaluation.

      (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).

      We have now clarified what “n” represents in each figure, as well as the number of animals (N) used in each experiment, in the Figure Legends.

      In this study, developmental stages of chick embryos were defined by HH stage (Hamburger and Hamilton, 1951), minimizing individual variability. Additionally, since our study focuses on the distribution of morphological characteristics of individual cells, averaging measurements per animal would obscure important cellular-level variability and potentially mislead interpretation of data. Furthermore, we employed a strategy of sparse genetic labeling in many experiments, which naturally results in variability in the number of measurable cells per animal. Given the clear distinctions in our data distributions, we believe that averaging per biological replicate is not essential in this case.

      To further ensure the robustness of our statistical analysis, data presented as boxplots were preliminarily assessed using PlotsOfDifferences, a web-based application that calculates and visualizes effect sizes and 95% confidence intervals based on bootstrapping (https://huygens.science.uva.nl/PlotsOfDifferences/; https://doi.org/10.1101/578575). Effect sizes can serve as a valuable alternative to p-values (Ho, 2018; https://www.nature.com/articles/s41592019-0470-3). The significant differences reported in our study are also supported by clear differences in effect sizes, ensuring that our conclusions remain robust regardless of the statistical approach used.

      If requested, we would be happy to provide PlotsOfDifferences outputs as supplementary source data files, similar to those used in eLife publications, for each figure.

      (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      We have now incorporated individual data points into the bar graphs in Figures 5 and 6.

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      The morphological differences of oligodendrocytes between white and gray matter are well established (i.e. shorter myelin at gray matter), but their correspondence with the nodal spacing pattern along the long axonal projections of cortical neurons is not well understood. Future research may find similarities with our findings. Additionally, as mentioned in the final section of the Discussion, the mammalian brainstem auditory circuit is functionally analogous to the avian ITD circuit. Regional differences in nodal spacing along axons have also been observed in the mammalian system, raising the important question of whether these differences are supported by regional heterogeneity in oligodendrocytes. Investigating this possibility will facilitate our understanding of the underlying logic and mechanisms for determining node spacing patterns along axons, as well as provide valuable insights into evolutionary convergence in auditory processing mechanisms. We described these explicitly in text (page 11, line 34).

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      As you summarized in your comment, our results demonstrated that A3V-eTeNT suppressed oligodendrogenesis in the NL region, leading to a reduction in oligodendrocyte density (Figures 6L, M), which caused the emergence of unmyelinated segments. While this is an indirect manipulation of oligodendrocyte density, it nonetheless provides evidence supporting a causal relationship between oligodendrocyte density and nodal spacing.

      The emergence of unmyelinated segments at the NL region further suggests that the myelin extension capacity of oligodendrocytes differs between regions, highlighting regional differences in intrinsic properties of oligodendrocyte as the most prominent determinant of nodal spacing variation. However, as you correctly pointed out, our findings do not establish direct causation.

      In the future, developing methods to artificially manipulate myelin length could provide a more definitive demonstration of causality. Given these considerations, we have modified the title to replace "determine" with "underlie", ensuring that our conclusions are presented with appropriate nuance.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:

      (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)

      (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)

      (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).

      (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Thank you for your insightful suggestions. We have incorporated the relevant references you provided and revised the manuscript accordingly to contextualize our findings within the existing literature.

      Minor comments:

      (7) Can the authors amend Fig. 1G with the correct units of measurement, not millimetres.

      Response: 

      Thank you for your suggestion. We have corrected the units in Figure 1G to µm

      (8) The Olig2 staining in Fig 2C does not appear to be nuclear, as would be expected of a transcription factor and as is well established for Olig2, but rather appears to be excluded from the nucleus, as it is in a ring or donut shape. Can the authors comment on this?

      Oligodendrocytes and OPCs have small cell bodies, often comparable in size to their nuclei. The central void in the ring-like Olig2 staining pattern appears too small to represent the nucleus. Additionally, a similar ring-like appearance is observed in BrdU labeling (Figure 5G), suggesting that this staining pattern may reflect nuclear morphology or other structural features.

      Significance

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

      The main finding of our study is that the primary determinant of the biased nodal spacing pattern in the sound localization circuit is the regional heterogeneity in the morphology of oligodendrocytes due to their intrinsic properties (e.g., their ability to produce and extend myelin sheaths) rather than the density of the cells. This was based on our observations that a reduction of oligodendrocyte density by A3V-eTeNT expression caused unmyelinated segments but did not increase internodal length (Figure 6), further revealing the importance of oligodendrocyte density in ensuring full myelination for the axons with short internodes. Thus, we think that our study could propose the significance of oligodendrocyte heterogeneity in the circuit function as well as in the nodal spacing using experimental manipulation of oligodendrocyte density. 

      Reviewer #3:

      Evidence, reproducibility and clarity

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing. I have some major concerns:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      Thank you for your insightful comment regarding the potential role of pre-nodal clusters in determining internodal length. Indeed, studies in zebrafish have suggested that pre-nodal clustering of node components prior to myelination may prefigure internodal length (Vagionitis et al., 2022). We have incorporated a discussion on whether such pre-nodal clusters could contribute to regional differences in nodal spacing in our manuscript (page 9, line 35).

      Whether pre-nodal clusters are detectable before myelination appears to depend on neuronal subpopulation (Freeman et al., 2015). To investigate the presence of pre-nodal clusters along NM axons in the brainstem auditory circuit, we previously attempted to visualize AnkG signals at E13 and E14. However, we did not observe clear structures indicative of pre-nodal clusters; instead, we only detected sparse fibrous AnkG signals with weak Nav clustering at their ends, consistent with hemi-node features. This result does not exclude the possibility of pre-nodal clusters on NM axons, as the detection limit of immunostaining cannot be ruled out. In brainstem slices, where axons are densely packed, nodal molecules are expressed at low levels across a wide area, leading to a high background signal in immunostaining, which may mask weak pre-nodal cluster signals prior to myelination. Regarding the comment on Figure 1D, we assume you are referring to Figure 2D based on the context. The lack of clarity in the high-magnification images in Figure 2D results from both the high background signal and the limited penetration of the MAG antibody. Furthermore, we are unable to verify Neurofascin accumulation at pre-nodal clusters, as there is currently no commercially available antibody suitable for use in chickens, despite our over 20 years of efforts to identify one for AIS research. Therefore, current methodologies pose significant challenges in visualizing pre-nodal clusters in our model. Future advancements, such as exogenous expression of fluorescently tagged Neurofascin at appropriate densities or knock-in tagging of endogenous molecules, may help overcome these limitations.

      However, a key issue to be discussed in this study is not merely the presence or absence of prenodal clusters, but rather whether pre-nodal clusters—if present—would determine regional differences in internodal length. To address this possibility, we have added new data in Figure 6I, measuring the length of unmyelinated segments that emerged following A3V-eTeNT expression.

      If pre-nodal clusters were fixed before myelination and predetermined internodal length, then the length of unmyelinated segments should be equal to or a multiple of the typical internodal length. However, our data showed that unmyelinated segments in the NL region were less than half the length of the typical NL internodal length, contradicting the hypothesis that fixed pre-nodal clusters determine internodal length along NM axons in this region.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      As mentioned in our reply to comment 2 by Reviewer #1, the diameter of NM axons was already evaluated using electron microscopy (EM) in the pioneering study by Seidl et al., (2010). Additionally, EM-based analysis makes it difficult to clearly distinguish between the main trunk of NM axons and thin collateral branches at the NL region. Accordingly, we did not do the EM analysis in this revision. 

      In Figure 4, we used palGFP, which is targeted to the cell membrane, allowing us to measure axon diameter by evaluating the distance between two membrane signal peaks. This approach minimizes the influence of the blurring of fluorescence signals on diameter measurements. Thus, we believe that our method is sufficient to evaluate the relative difference in axon diameters between regions and hence to show that axon diameter is not the primary determinant of the 3-fold difference in internodal length between regions. 

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      The heterogeneity in oligodendrocyte morphology would reflect differences in gene profiles, which, in turn, may arise from differences in their developmental origin and/or pericellular microenvironment of OPCs. We made this point as clear as possible in Discussion (page 9, line 21).

      Significance

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    1. eLife Assessment

      This important study combines electrocardiographic (ECG) and heart/torso anatomy data from subjects included in the UK Biobank to analyze sex-specific differences in relationships between those two characteristics. The study has several compelling strengths, including the development of an open-source pipeline for reconstruction and analysis of heart/torso geometry from a large cohort. Nevertheless, technical analysis of the data as presented is incomplete, specifically as it pertains to assessment of co-linearity between regressed parameters, interpretation of regression coefficients for sex and/or presence of myocardial infarction, and discussion of potential roles played by underlying electrophysiological derangements. With improvements to these aspects of the analysis, the paper would be of interest to the cardiovascular research community, especially those studying highly relevant health and treatment disparities arising from sex differences.

    2. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

    3. Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 post-MI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is state-of-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

    1. eLife Assessment

      This valuable study investigates the self-assembly activity of death-fold domains. The data collected using advanced microscopy and distributed amphifluoric FRET-based flow cytometry methods provide solid evidence for the conclusions, although the interpretations based on these conclusions remain speculative in some cases. This paper is broad interest to those studying a variety of biological pathways involved in inflammatory responses and various forms of cell death.

    2. Reviewer #1 (Public review):

      Summary:

      This is a high-quality and extensive study that reveals differences in the self-assembly properties of the full set of 109 human death fold domains (DFDs). Distributed amphifluoric FRET (DAmFRET) is a powerful tool that reveals the self-assembly behaviour of the DFDs, in non-seeded and seeded contexts, and allows comparison of the nature and extent of self-assembly. The nature of the barriers to nucleation is revealed in the transition from low to high AmFRET. Alongside analysis of the saturation concentration and protein concentration in the absence of seed, the subset of proteins that exhibited discontinuous transitions to higher-order assemblies was observed to have higher concentrations than DFDs that exhibited continuous transitions. The experiments probing the ~20% of DFDs that exhibit discontinuous transition to polymeric form suggest that they populate a metastable, supersaturated form in the absence of cognate signal. This is suggestive of a high intrinsic barrier to nucleation.

      Strengths:

      The differences in self-assembly behaviour are significant and likely identify mechanistic differences across this large family of signalling adapter domains. The work is of high quality, and the evidence for a range of behaviours is strong. This is an important and useful starting point since the different assembly mechanisms point towards specific cellular roles. However, understanding the molecular basis for these differences will require further analysis.

      An impressive optogenetic approach was engineered and applied to initiate self-assembly of CASP1 and CASP9 DFDs, as a model for apoptosome initiation in these two DFDs with differing continuous or discontinuous assembly properties. This comparison revealed clear differences in the stability and reversibility of the assemblies, supporting the hypothesis that supersaturation-mediated DFD assembly underlies signal amplification in at least some of the DFDs.

      The study reveals interesting correlations between supersaturation of DFD adapters in short- and long-lived cells, suggestive of a relationship between the mechanism of assembly and cellular context. Additionally, the comprehensive nature of the study provides strong evidence that the interactions are almost all homomeric or limited to members of the same DFD subfamily or interaction network. Similar approaches with bacterial proteins from innate immunity operons suggest that their polymerisation may be driven by similar mechanisms.

      Weaknesses:

      Only a limited investigation of assembly morphology was conducted by microscopy. There was a tendency for discontinuous structures to form fibrillar structures and continuous to populate diffuse or punctate structures, but there was overlap across all categories, which is not fully explored. The methodology used to probe oligomeric assembly and stability (SDD-AGE) does not justify the conclusions drawn regarding stability and native structure within the assemblies.

      The work identifies important differences between DFDs and clearly different patterns of association. However, most of the detailed analysis is of the DFDs that exhibit a discontinuous transition, and important questions remain about the majority of other DFDs and why some assemblies should be reversible and others not, and about the nature of signalling arising from a continuous transition to polymeric form.

      Some key examples of well-studied DFDs, such as MyD88 and RIPK,1 deserve more discussion, since they display somewhat surprising results. More detailed exploration of these candidates, where much is known about their structures and the nature of the assemblies from other work, could substantiate the conclusions here and transform some of the conclusions from speculative to convincing.

      The study concludes with general statements about the relationship between stochastic nucleation and mortality, which provide food for thought and discussion but which, as they concede, are highly speculative. The analogies that are drawn with batteries and privatisation will likely not be clearly understood by all readers. The authors do not discuss limitations of the study or elaborate on further experiments that could interrogate the model.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from Rodriguez Gama et al. proposes several interesting conclusions based on different oligomerization properties of Death-Fold Domains (DFDs) in cells, their natural abundance, and supersaturation properties. These ideas are:<br /> (1) DFDs broadly store the cell's energy by remaining in a supersaturated state;<br /> (2) Cells are constantly in a vulnerable state that could lead to cell death;<br /> (3) The cell's lifespan depends on the supersaturation levels of certain DFDs.

      Overall, the evidence supporting these claims is not completely solid. Some concerns were noted.

      Strengths:

      Systematic analysis of DFD self-assembly and its relationship with protein abundance, supersaturation, cell longevity, and evolution.

      Weaknesses

      (1) On page 2, it is stated, "Nucleation barriers increase with the entropic cost of assembly. Assemblies with large barriers, therefore, tend to be more ordered than those without. Ordered assembly often manifests as long filaments in cells," as a way to explain the observed results that DFDs assemblies that transitioned discontinuously form fibrils, whereas those that transitioned continuously (low-to-high) formed spherical or amorphous puncta. It is unlikely to be able to differentiate between amorphous and structured puncta by conventional confocal microscopy. Some DFDs self-assemble into structured puncta formed by intertwined fibrils. Such fibril nets are more structured and thus should be associated with a higher entropic cost. Therefore, the results in Figure 1B do not seem to agree with the reasoning described.

      (2) Errors for the data shown in Figure 1B would have been very useful to determine whether the population differences between diffuse, punctate, and fibrillar for the continuous (low-to-high) transition are meaningful.

      (3) A main concern in the data shown in Figure 1B and F is that the number of counts for discontinuous compared to continuous is small. Thus, the significance of the results is difficult to evaluate in the context of the broad function of DFDs as batteries, as stated at the beginning of the manuscript.

      (4) The proteins or domains that are self-seeded (Figure 1F) should be listed such that the reader has a better understanding of whether domains or full-length proteins are considered, whether other domains have an effect on self-seeding (which is not discussed), and whether there is repetition.

      (5) The authors indicate an anticorrelation between transcript abundance and Csat based on the data shown in Figure 2B; however, the data are scattered. It is not clear why an anticorrelation is inferred.

      (6) It would be useful to indicate the expected range of degree centrality. The differences observed are very small. This is specifically the case for the BC values. The lack of context and the small differences cast doubts on their significance. It would be beneficial to describe these data in the context of the centrality values of other proteins.

      (7) Page 3 section title: "Nucleation barriers are a characteristic feature of inflammatory signalosome adaptors." This title seems to contradict the results shown in Figure 2D, where full-length CARD9 and CARD11 are classified as sensors, but it has been reported that they are adaptor proteins with key roles in the inflammatory response. Please see the following references as examples: The adaptor protein CARD9 is essential for the activation of myeloid cells through ITAM-associated and Toll-like receptors. Nat Immunol 8, 619-629 (2007), and Mechanisms of Regulated and Dysregulated CARD11 Signaling in Adaptive Immunity and Disease. Front Immunol. 2018 Sep 19;9:2105.

      However, both CARD9 and CARD11 show discontinuous to continuous behavior for the individual DFDs versus full-length proteins, respectively, in contrast to the results obtained for ASC, FADD, etc. FADD plays a key role in apoptosis but shows the same behavior as BCL10 and ASC. However, the manuscript indicates that this behavior is characteristic of inflammatory signalosomes. What is the explanation for adaptor proteins behaving in different ways? This casts doubts about the possibility of deriving general conclusions on the significance of these observations, or the subtitles in the results section seem to be oversimplifications.

      (8) IFI16-PYD displays discontinuous behavior according to Figure S1H; however, it is not included in Figure 2D, but AIM 2 is.

      (9) To demonstrate that "Nucleation barriers facilitate signal amplification in human cells," constructs using APAF1 CARD, NLRC4 CARD, caspase-9 CARD, and a chimera of the latter are used to create what the authors refer to as apoptsomes. Even though puncta are observed, referring to these assemblies as apoptosomes seems somewhat misleading. In addition, it is not clear why the activity of caspase-9 was not measured directly, instead of that of capsae-3 and 7, which could be activated by other means. The polymerization of caspase-1 CARD with NLRC4 CARD, leading to irreversible puncta, could just mean that the polymers are more stable. In fact, not all DFDs form equally stable or identical complexes, which does not necessarily imply that a nucleation barrier facilitates signal amplification. Could this conclusion be an overstatement?

      (10) To demonstrate that "Innate immune adaptors are endogenously supersaturated," it is stated on page 5 that ASC clusters continue to grow for the full duration of the time course and that AIM2-PYD stops growing after 5 min. The data shown in Figure 4F indicate that AIM2-PYD grows after 5 mins, although slowly, and ASC starts to slow down at ~ 13 min. Because ASC has two DFDs, assemblies can grow faster and become bigger. How is this related to supersaturation?

    4. Author response:

      We appreciate constructive feedback from both reviewers. Reviewer 1 provided a very positive assessment and helpful suggestions for clarity, which we will incorporate.

      We also thank Reviewer 2 for their detailed comments. In some instances, their public review raised concerns about specific data or interpretations that are, in fact, already presented and justified in the original manuscript. This feedback has highlighted a need to improve the clarity of our presentation. 

      In our revised manuscript, we will make key information more prominent to prevent further misunderstandings. We will also provide additional statistical validation for our conclusions, additional data from the optogenetic experiments and high throughput imaging, and further elaborate on the behaviors of specific proteins (FADD, MyD88, and RIPK1). We are confident that these revisions will make our findings more transparent and accessible to readers, and we look forward to submitting our revised manuscript.

    1. eLife Assessment

      During the development of the unicellular eukaryote Dictyostelium discoideum, cells aggregate into mounds, which then form protrusions called tips, and the tips then become the front of migrating slugs and the top of fruiting bodies. This valuable study identifies a protein called adenosine deaminase-related growth factor (ADGF) as a key regulator of tip formation, and the authors convincingly show that ADGF catalyses the formation of ammonia from adenosine, allowing ammonia to initiate tip formation, and they then elucidate pathways upstream and downstream from ADGF. The authors discuss the intriguing possibility that mammalian ADGF may also regulate development in a similar manner.

    2. Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the adgf gene aggregate but do not form tips. A remarkable result, shown by several different ways, is that the adgf mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the adgf mutant such as increased mound size, altered cAMP signaling, and abnormal cell type differentiation. It appears that the adgf mutant has defects the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signaling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses:

      The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development. The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound. By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what. One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a miniscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the adgf cells in the mound - do they all form spores? Do some form spores? Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      Comments on revisions:

      Looks better, but I think you answered my questions (listed as weaknesses in the public review) in the reply to the reviewer but not in the paper. I'd suggest carefully thinking about my questions and addressing them in the Discussion. You did however do all of the things in the paper that were listed as "Recommendations for the authors"

    3. Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (adgf), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The adgf null mutant has a pre-tip mound arrest phenotype, which can be rescued by external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signaling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an adgf mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterisation of significant changes in cAMP signaling components, suggesting low cAMP signaling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell-type differentiation towards prestalk fate

      Weaknesses:

      Lack of details on developmental time course of adgf activity and cell-type-specific differences in adgf expression. Absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signaling. No direct measurements in the dhkD mutant to show that it acts upstream of sdgf in the control of changes in cAMP signaling and tip formation.

      Comments on revisions:

      The revised version of the paper has improved significantly in terms of structure and clarity. The additional data on rescue of total cAMP production by ammonia (Fig. 7C) in the adgf- mutant and the 5-fold increased prespore expression of adgf RNA compared to prestalk cells (Fig 9) are useful data additions.

      The link between changes in cAMP signaling (lower aca expression) and wave geometry ( concentric waves rather than spiral waves) remains speculative.

      I noted that Fig 6 contains different images than the previous version (Fig 7).

      The statement "Interestingly, Klebsiella pneumoniae physically separated from the Dictyostelium adgf mutants in a partitioned dish, also rescues the mound arrest phenotype suggesting a cross-kingdom interaction that drives development" in the summary is rather overdone. All experiments were performed with axenic strains (no bacteria).

      as is the sentence "Remarkably, in higher vertebrates, adgf expression is elevated during gastrulation and thus adenosine deamination may be a conserved process driving organizer development in different organisms"<br /> The data supporting this in the supplementary information is hardly legible and poorly presented. What is shown is ADA expression in different tissues, not at different stages. I would suggest taking these figures out and concentrating the summary on the key mechanistic findings of the paper.

    4. Author Response :

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses

      (1) The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Ammonia can come from a variety of sources both within and outside the cells and this can be from dead cells also. Ammonia by increasing cAMP levels, trigger collective cell movement thereby establishing a tip in Dictyostelium. A gaseous signal can act over long distances in a short time and for instance ammonia promotes synchronous development in a colony of yeast cells (Palkova et al., 1997; Palkova and Forstova, 2000). The slug tip is known to release ammonia probably favouring synchronized development of the entire colony of Dictyostelium. However, after the tips are established ammonia exerts negative chemotaxis probably helping the slugs to move away from each other ensuring equal spacing of the fruiting bodies (Feit and Sollitto, 1987).

      It is well known that ammonia serves as a signalling molecule influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). Ammonia by raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, is known to increase the speed of chemotaxing amoebae (Siegert and Weijer, 1989; Van Duijn and Inouye, 1991), inducing collective cell movement (Bonner et al., 1988, 1989), favoring tipped mound development.

      Ammonia produced in millimolar concentrations during tip formation (Schindler and Sussman, 1977) could ward off other predators in soil. For instance, ammonia released by Streptomyces symbionts of leaf-cutting ants is known to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled back into amino acids, as observed during breast cancer proliferation (Spinelli et al., 2017). Such a process may also occur in starving Dictyostelium cells, supporting survival and differentiation. These findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development.

      (2) The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia reinforces or maintains the positional information by elevating cAMP levels, favoring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). In adgf mutants that have low ammonia levels, both neutral red staining (a marker for prestalk and ALCs) (Figure. S3) and the prestalk marker ecmA/ ecmB expression (Figure. 7D) are higher than the WT and the mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.

      Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement during tip formation (Bonner et al., 1989).

      (3) By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      Exposure of adgf mounds to ammonia, led to tip development within 4 h (Figure. 5). In contrast, adgf controls remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not the trigger for tip development and ammonia promotes the transition from mound to tipped mound formation.

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Further, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Figure. S4A), and they continue to stay as mounds.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Figure. 2H. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) and suggests that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      Reviewer #1 (Recommendations for the authors):

      (1) Lines: 47,48 - "The gradient of these morphogens along the slug axis determines the cell fate, either as prestalk (pst) or as prespore (psp) cells." - many workers have shown that this is not true - intrinsic factors such as cell cycle phase drive cell fate.

      Thank you for pointing this out. We have removed the line and rephrased as “Based on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate as prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011).

      (2) Line 48 - PKA - please explain acronyms at first use.

      Corrected

      (3) Line 56 - The relationship between adenosine deaminase and ADGF is a bit unclear, please clarify this more.

      Adenosine deaminase (ADA) is intracellular, whereas adenosine deaminase related growth factor (ADGF) is an extracellular ADA and has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008).

      (4) Figure 1 - where are these primers, and the bsr cassette, located with respect to the coding region start and stop sites?

      The primer sequences are mentioned in the supplementary table S2. The figure legend is updated to provide a detailed description.

      (5) Line 104 - 37.47% may be too many significant figures.

      Corrected

      (6) Line 123 - 1.003 Å may be too many significant figures.

      Corrected

      (7) Line 128 - Since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected

      (8) Figure 3G - did the DCF also increase mound size? It sort of looks like it did.

      Yes, the addition of DCF increases the mound size (now Figure. 2G).

      (9) Figure 3I - the spore mass shown here for ADGF - looks like there are 3 stalks protruding from it; this can happen if a plate is handled roughly and the spore masses bang into each other and then merge

      Thank you for pointing this out. The figure 3I (now Figure. 2I) is replaced.

      (10) Lines 160-162 - since the data are in the figure, you don't need to give the numbers, also too many significant figures.

      Corrected.

      (11) Line 165 - ' ... that are involved in adenosine formation' needs a reference.

      Reference is included.

      (12) Line 205 - 'Addition of ADA to the CM of the mutant in one compartment.' - might clarify that the mutant is the ADGF mutant

      Yes, revised to 'Addition of ADA to the CM of the adgf mutant in one compartment.

      (13 Lines 222-223 need a reference for caffeine acting as an adenosine antagonist.

      Reference is included.

      (14) Figure 8B - left - use a 0-4 or so scale so the bars are more visible.

      Thank you for the suggestion. The scale of the y-axis is adjusted to 0-4 in Figure. 7B to enhance the visibility of the bars.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      (1) Weaknesses: Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      adgf expression was examined at 0, 8, 12, and 16 h (Figure. 1), and the total ADA activity was assayed at 12 and 16 h (Figure. 3). Previously, the 12 h data was not included, and it’s been added now (Figure. 3A). The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. Since the ADA assay will also report the activity of other three isoforms, it will not exclusively reflect ADGF activity.

      Mixing WT and adgf mutant cell lines results in a chimeric slug with mutants occupying the prestalk region (Figure. 8) suggesting that WT adgf favours prespore differentiation. To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (2) The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The adgf mutant in comparison to WT has diminished acaA expression (Fig. 6B) and reduced cAMP levels (Fig. 6A) both at 12 and 16 h of development. The cAMP levels were measured at 8 h and 12 h in the mutant.

      We would like to add that ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) in Dictyostelium. Exposure to ammonia increases acaA expression in WT (Figure. 7B) and is likely to increase acaA expression/ cAMP levels in the mutant also (Riley and Barclay, 1990; Feit et al., 2001) thereby rescuing the defects in cAMP signalling. Based on the comments, cAMP levels will also be measured in the mutant after the rescue with ammonia.

      (3) No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      cAMP levels will be quantified in the dhkD mutant after treatment with ammonia. The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation.

      Reviewer #2 (Recommendations for the authors):

      The paper describes new insights into the role of ADGF, an enzyme that catalyses the breakdown of adenosine in ammonia and inosine, in tip formation in Dictyostelium development.

      A knockout of the gene results in a tipless mound stage arrest and the mounds formed are somewhat larger in size. Synergy experiments show that the effect of the mutation is non-cell autonomous and further experiments show that the mound arrest phenotype can be rescued by the provision of ammonia vapour. These observations are well documented. Furthermore, the paper contains a wide variety of experiments attempting to place the observed effects in known signalling pathways. It is suggested that ADGF may function downstream of DhkD, a histidine kinase previously implicated in ammonia signalling. Ammonia has long been described to affect different aspects, including differentiation of slug and culmination stages of Dictyostelium development, possibly through modulating cAMP signalling, but the exact mechanisms of action have not yet been resolved. The experiments reported here to resolve the mechanistic basis of the mutant phenotype need focusing and further work.

      (1) The paper needs streamlining and editing to concentrate on the main findings and implications.

      The manuscript will be revised extensively.

      Below is a list of some more specific comments and suggestions.

      (2) Introduction: Focus on what is relevant to understanding tip formation and the role of nucleotide metabolism and ammonia (see https://doi.org/10.1016/j.gde.2016.05.014).leading). This could lead to the rationale for investigating ADGF.

      The manuscript will be revised extensively

      (3) Lines 36-38 are not relevant. Lines 55-63 need shortening and to focus on ADGF, cellular localization, and substrate specificity.

      The manuscript will be revised accordingly. Lines 36-38 will be removed, and the lines 55-63 will be shortened.

      In humans, two isoforms of ADA are known including ADA1 and ADA2, and the Dictyostelium homolog of ADA2 is adenosine deaminase-related growth factor (ADGF). Unlike ADA that is intracellular, ADGF is extracellular and also has a growth factor activity (Li and Aksoy, 2000; Iijima et al., 2008). Loss-of-function mutations in ada2 are linked to lymphopenia, severe combined immunodeficiency (SCID) (Gaspar, 2010), and vascular inflammation due to accumulation of toxic metabolites like dATP (Notarangelo, 2016; Zhou et al., 2014).

      (4) Results: This section would benefit from better streamlining by a separation of results that provide more mechanistic insight from more peripheral observations.

      The manuscript will be revised and the peripheral observations (Figure. 2) will be shifted to the supplementary information.

      (5) Line 84 needs to start with a description of the goal, to produce a knockout.

      Details on the knockout will be elaborated in the revised manuscript. Line number 84 (now 75). Dictyostelium cell lines carrying mutations in the gene adgf were obtained from the genome wide Dictyostelium insertion (GWDI) bank and were subjected to further analysis to know the role of adgf during Dictyostelium development.

      (6) Knockout data (Figure 1) can be simplified and combined with a description of the expression profile and phenotype Figure 3 F, G, and Figure 5. Higher magnification and better resolution photographs of the mutants would be desirable.

      Thank you, as suggested the data will be simplified (section E will be removed) and combined with a description of the expression profile and, the phenotype images of Figure 3 F, G, and Figure 5 ( now Figure. 2 F, G, and Figure. 4) will be replaced with better images/ resolution.

      (7) It would also be relevant to know which cells actually express ADGF during development, using in-situ hybridisation or promoter-reporter constructs.

      To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (8) Figure 2 - Information is less directly relevant to the topic of the paper and can be omitted (or possibly in Supplementary Materials).

      Figure. 2 will be moved to supplementary materials.

      (9) Figures 4A, B - It is shown that as could be expected ada activity is somewhat reduced and adenosine levels are slightly elevated. However, the fact that ada levels are low at 16hrs could just imply that differentiation of the ADGF- cells is blocked/delayed at an earlier time point. To interpret these data, it would be necessary to see an ada activity and adenosine time course comparison of wt and mutant, or to see that expression is regulated in a celltype specific manner that could explain this (see above). It would be good to combine this with the observation that ammonia levels are lower in the ADGF- mutant than wildtype and that the mutant phenotype, mound arrest can be rescued by an external supply of ammonia (Figure 6).

      In Dictyostelium four isoforms of ADA including ADGF are present, and thus the time course of total ADA activity will also report the function of other isoforms. Further, a number of pathways, generate adenosine (Dunwiddie et al., 1997; Boison and Yegutkin, 2019). ADGF expression was examined at 0, 8, 12 and 16 h (Fig 1) and the ADA activity was assayed at 12 h, the time point where the expression gradually increases and reaches a peak at 16 h. Earlier, we have not shown the 12 h activity data which will be included in the revised version. ADGF expression was found to be highly elevated at 16 h and adenosine/ammonia levels were measured at the two points indicated in the mutant.

      (10) Panel 4C could be combined with other measurements trying to arrive at more insight in the mechanisms by which ammonia controls tip formation.

      Panel 4C (now 3C) illustrates the genes involved in the conversion of cAMP to adenosine. Since Figure. 3 focuses on adenosine levels and ADA activity in both WT and adgf mutants, we have retained Panel 3C in Figure. 3, for its relevance to the experiment.

      (11) There is a large variety of experiments attempting to link the mutant phenotype and its rescue by ammonia to cAMP signalling, however, the data do not yet provide a clear answer.

      It is well known that ammonia increases cAMP levels (Riley and Barclay, 1990; Feit et al., 2001) and adenylate cyclase activity (Cotter et al., 1999) in D. discoideum, and exposure to ammonia increases acaA expression (Fig 7B) suggesting that ammonia regulates cAMP signaling. To address the concerns, cAMP levels will be quantified in the mutant after ammonia treatment.

      (12) The mutant is shown to have lower cAMP levels at the mound stage which ties in with low levels of acaA expression (Figures 7A and B), also various phosphodiesterases, the extracellular phosphodiesterase pdsa and the intracellular phosphodiesterase regA show increased expression. Suggesting a functional role for cAMP signalling is that the addition of di cGMP, a known activator of acaA, can also rescue the mound phenotype (Figure 7E). There appears to be a partial rescue of the mound arrest phenotype level by the addition of 8Br-cAMP (fig 7C), suggesting that intracellular cAMP levels rather than extracellular cAMP signalling can rescue some of the defects in the ADGF- mutant. Better images and a time course would be helpful.

      The relevant images will be replaced and a developmental time course after 8-Br-cAMP treatment will be included in the revised manuscript (Figure. 6D).

      (13) There is also the somewhat surprising observation that low levels of caffeine, an inhibitor of acaA activation also rescues the phenotype (Figure 7F).

      With respect to caffeine action on cAMP levels, the reports are contradictory. Caffeine has been reported to increase adenylate cyclase expression thereby increasing cAMP levels (Hagmann, 1986) whereas Alvarez-Curto et al., (2007) found that caffeine reduced intracellular cAMP levels in Dictyostelium. Caffeine, although is a known inhibitor of ACA, is also known to inhibit PDEs (Nehlig et al., 1992; Rosenfeld et al., 2014). Therefore, if caffeine differentially affects ADA and PDE activity, it may potentially counterbalance the effects and rescue the phenotype.

      (14) The data attempting to asses cAMP wave propagation in mounds (Fig 7H) are of low quality and inconclusive in the absence of further analysis. It remains unresolved how this links to the rescue of the ADGF- phenotype by ammonia. There are no experiments that measure any of the effects in the mutant stimulated with ammonia or di-cGMP.

      The relevant images will be replaced (now Figure. 6H). Ammonia by increasing acaA expression (Figure. 7B), and cAMP levels (Figure. 7C) may restore spiral wave propagation, thereby rescuing the mutant.

      (15) A possible way forward could also come from the observation that ammonia can rescue the wobbling mound arrest phenotype from the histidine kinase mutant dhkD null mutant, which has regA as its direct target, linking ammonia and cAMP signalling. This is in line with other work that had suggested that another histidine kinase, dhkC transduces an ammonia signal sensor to regA activation. A dhkC null mutant was reported to have a rapid development phenotype and skip slug migration (Dev. Biol. (1998) 203, 345). There is no direct evidence to show that dhkD acts upstream of ADGF and changes in cAMP signalling, for instance, measurements of changes in ADA activity in the mutant.

      cAMP levels will be quantified in the dhkD mutant after ammonia treatment and accordingly, the results will be revised.

      (16) The paper makes several further observations on the mutant. After 16 hrs of development the adgf- mutant shows increased expression of the prestalk cell markers ecmA and ecmB and reduced expression of the prespore marker pspA. In synergy experiments with a majority of wildtype, these cells will sort to the tip of the forming slug, showing that the differentiation defect is cell autonomous (Fig 9). This is interesting but needs further work to obtain more mechanistic insight into why a mutant with a strong tip/stalk differentiation tendency fails to make a tip. Here again, knowing which cells express ADGF would be helpful.

      The adgf mutant shows increased prestalk marker expression in the mound but do not form a tip. It is well known that several mound arrest mutants form differentiated cells but are blocked in development with no tips (Carrin et al., 1994). This is addressed in the discussions (539). To address whether adgf expression is cell type-specific, prestalk and prespore cells will be separated by fluorescence activated cell sorter (FACS), and thereafter, adgf expression will be examined in each population.

      (17) The observed large mound phenotype could as suggested possibly be explained by the low ctn, smlA, and high cadA and csA expression observed in the mutant (Figure 3). The expression of some of these genes (csA) is known to require extracellular cAMP signalling. The reported low level of acaA expression and high level of pdsA expression could suggest low levels of cAMP signalling, but there are no actual measurements of the dynamics of cAMP signalling in this mutant to confirm this.

      The acaA expression was examined at 8 and 12 h (Figure. 6B) and cAMP levels were measured at 12 and 16 h in the adgf mutants (Figure. 6A). Both acaA expression and cAMP levels were reduced, suggesting that cells expressing adgf regulate acaA expression and cAMP levels. This regulation, in turn, is likely to influence cAMP signaling, collective cell movement within mounds, ultimately driving tip development. Exposure to ammonia led to increased acaA expression (Figure. 7B) in in WT. Based on the comments above, cAMP levels will be measured in the mutant before and after rescue with ammonia.

      (18) Furthermore, it would be useful to quantify whether ammonia addition to the mutant reverses mound size and restores any of the gene expression defects observed.

      Ammonia treatment soon after plating or six hours after plating, had no effect on the mound size (Figure. 5G).

      (19) There are many experimental data in the supplementary data that appear less relevant and could be omitted Figure S1, S3, S4, S7, S8, S9, S10.

      Figure S8, S9, S10 are omitted. We would like to retain the other figures

      Figure S1 (now Figure. S2): It is widely believed that ammonia comes from protein (White and Sussman, 1961; Hames and Ashworth, 1974; Schindler and Sussman, 1977) and RNA (Walsh and Wright, 1978) catabolism. Figure. S2 shows no significant difference in protein and RNA levels between WT and adgf mutant strains, suggesting that adenosine deaminaserelated growth factor (ADGF) activity serves as a major source of ammonia and plays a crucial role in tip organizer development in Dictyostelium. Thus, it is important to retain this figure.

      Figure S3 (now Figure. S4): The figure shows the treatment of various mound arrest mutants and multiple tip mutants with ADA enzyme and DCF, respectively, to investigate the pathway through which adgf functions. Additionally, it includes the rescue of the histidine kinase mutant dhkD with ammonia, indicating that dhkD acts upstream of adgf via ammonia signalling. Therefore, it is important to retain this figure.

      Figure S4 (now Figure. S5): This figure represents the developmental phenotype of other deaminase mutants. Unlike adgf mutants, mutations in other deaminases do not result in complete mound arrest, despite some of these genes exhibiting strong expression during development. This underscores the critical role of adenosine deamination in tip formation. Therefore, let this figure be retained.

      Figure S7 (now Figure. S8): Figure S8 presents the transcriptomic profile of ADGF during gastrulation and pre-gastrulation stages across different organisms, indicating that ADA/ADGF is consistently expressed during gastrulation in several vertebrates (Pijuan-Sala et al., 2019; Tyser et al., 2021). Notably, the process of gastrulation in higher organisms shares remarkable similarities with collective cell movement within the Dictyostelium mound (Weijer, 2009), suggesting a previously overlooked role of ammonia in organizer development. This implies that ADA may play a fundamental role in regulating morphogenesis across species, including Dictyostelium and vertebrates. Therefore, we would like to retain this figure.

      (20) Given the current state of knowledge, speculation about the possible role of ADGF in organiser function in amniotes seems far-fetched. It is worth noting that the streak is not equivalent to the organiser. The discussion would benefit from limiting itself to the key results and implications.

      The discussion is revised accordingly by removing the speculative role of ADGF in organizer function in amniotes. The lines “It is likely that ADA plays a conserved, fundamental role in regulating morphogenesis in Dictyostelium and other organisms including vertebrates” have been removed.

    1. eLife Assessment

      This study provides a valuable examination of the social discrimination abilities of a jumping spider, Phippidus regius, based on visual cues. Behavioral essays yielded solid evidence that these spiders discriminate between familiar and unfamiliar individuals on the basis of visual cues, however the experimental support for individual recognition and long-term memory is incomplete. While the results supply evidence of discrimination, additional experiments would be needed to verify the evidence of individual recognition.

    2. Reviewer #1 (Public review):

      Summary:

      The paper sets out to examine the social recognition abilities of a 'solitary' jumping spider species. It demonstrates that based on vision alone spiders can habituate and dishabituate to the presence of conspecifics. The data support the interpretation that these spiders can distinguish between conspecifics on the basis of their appearance.

      Strengths:

      The study presents two experiments. The second set of data recapitulates the findings of the first experiment with a independent set of spiders, highlighting the strength of the results. The study also uses a highly quantitative approach to measuring relative interest between pairs of spiders based on their distance.

      Weaknesses:

      The study design is overly complicated, while missing key controls, and the data presented in the figures are not clearly connected to study. The discussion is challenging to understand and appears to make unsupported conclusions.

      (1) Study design: The study design is rather complicated and as a result it is difficult to interpret the results. The spiders are presented with the same individual twice in a row, called a habituation trial. Then a new individual is presented twice in a row. The first of these is a dishabituation trial and the second another habituation trial (but now habituating to a second individual). This done with three pairings and then this entire structure is repeated over three sessions. The data appear to show the strong effects of differences between habituation and dishabituation trials in the first session. The decrease in differential behavior between the so-called habituation and dishabituation trials in sessions 2 and 3 are explained as a consequence of the spiders beginning to habituate in general to all of the individuals. The claim that the spiders remember specific individuals is somewhat undercut because all of the 'dishabituation' trials in session 2 are toward spiders they already met for 14 minute previously but seemingly do not remember in session 2. In session 3 it is ambiguous what is happening because the spiders no longer differentiate between the trial types. This could be due to fatigue or familiarity. A second experiment is done to show that introducing a totally novel individual, recovers a large dishabituation response, suggesting that the lack of differences between 'habituation' and 'dishabituation' trials in session 3 is the result of general habituation to all of the spiders in the session rather than fatigue. As mentioned before, these data do support the claim that the spiders differentiate among individuals.

      The data from session 1 are easy to interpret. The data from sessions 2 and 3 are harder to understand, but these are the trials in which they meet an individual again after a substantial period of separation. Other studies looking at recognition in ants and wasps (cited by the authors) have done a 4 trial design in which focal animal A meets B in the first trial, then meet C in the second trial, meets B again in the third trial, and then meets D in the last trial. In that scenario trials 1, 2 and 4 are between unfamiliar individuals and trial 3 is between potentially familiar individuals. In both the ants and wasps, high aggression is seen in species with and without recognition on trial 1, with low aggression specifically for trials with familiar individuals in species with recognition. Across different tests, species or populations that lack recognition have shown a general reduction in aggression towards all individuals that becomes progressively less aggressive over time (reminiscent of the session 2 and 3 data) while others have maintained modest levels of aggression across all individuals. The 4 session design used in those other studies provides an unambiguous interpretation of the data, while controlling for 'fatigue'. That all trials in sessions 2 and 3 are always with familiar individuals make it challenging to understand how much the spiders are habituating to each other versus having some kind of associative learning of individual identity and behavior.

      The data presentation is also very complicated. How is it the case that a negative proportion of time is spent? The methods reveal that this metric is derived by comparing the time individuals spent in each region relative to the previous time they saw that individual. At the very least, data showing the distribution of distances from the wall would be much easier to interpret for the reader.

      (2) "Long-term social memory": It is not entirely clear what is meant by the authors when they say 'long-term social memory', though typically long-term memory refers to a form of a memory that require protein synthesis. While the precise timing of memory formation varies across species and contexts, a general rule is that long term memory should last for > 24 hours (e.g., Dreier et al 2007 Biol Letters). The longest time that spider are apart in this trial set up is something like an hour. There is no basis to claim that spiders have long term social memory as they are never asked to remember anyone after a long time apart. The odd phrasing of the 'long-term dishabutation' trial makes it seem that it is testing a long-term memory, but it is not. The spiders have never met. The fact that they are very habituated to one set of stimuli and then respond to a new stimulus is not evidence of long-term memory. To clearly test memory (which is the part really lacking from the design), the authors would need to show that spiders - upon the first instance of re-encountering a previously encountered individual are already 'habituated' to them but not to some other individuals. The current data suggest this may be the case, but it is just very hard to interpret given the design does not directly test memory of individuals in a clear and unambiguous manner.

      (3) Lack of a functional explanation and the emphasis on 'asociality': It is entirely plausible that recognition is pleitropic byproduct of the overall visual cognition abilities in the spiders. However, the discussion that discounts territoriality as a potential explanation is not well laid out. First, many species that are 'asocial' nevertheless defend territories. It is perhaps best to say such species are not group living, but they have social lives because they encounter conspecifics and need to interact with them. Indeed, there are many examples of solitary living species that show the dear enemy effect, a form of individual recognition, towards familiar territorial neighbors. The authors in this case note that territorial competition is mediated by the size of color of the chelicerae (seemingly a trait that could be used to distinguish among individuals). Apparently because previous work has suggested that territorial disputes can be mediated by a trait in the absence of familiarity has led them to discount the possibility that keeping track of the local neighbors in a potentially cannibalistic species could be a sufficient functional reason. In any event, the current evidence presented certainly does not warrant discounting that hypothesis.

      Comments on Revision:

      The authors have not actually addressed my points and their comments conflate discrimination with recognition. The extensive discussion about how babies are tested for discrimination tasks in their rebuttal misses the point. I believe that the data do show that the spiders discriminate between individuals but whether individuals are recognized (i.e., remembered) is less clear. The authors defend their convoluted study design, but it is overly complex and challenging to interpret the data as a result.

      The main issue with the design is that they do not actually test for any kind of memory of specific individuals after a substantial time of separation. Instead they show that a new individuals is still surprising/dishabituating. That is nice evidence for discrimination but does not show memory in a clear and unambiguous way.

      My comments and critique are unchanged since they didn't really change the paper. New experiments were needed and they didn't do any. Perhaps it is hard to get the spiders where they are? I don't really understand why they didn't do additional experiments as part of this revision.

    3. Reviewer #3 (Public review):

      Summary:

      Jumping spiders (family Salticidae) have extraordinarily good eyesight, but little is known about how sensitive these small animals might be to the identity of other individuals that they see. Here, experiments were carried out using Phidippus regius, a salticid spider from North America. There were three steps in the experiments; first, a spider could see another spider; then its view of the other spider was blocked; and then either the same or a different individual spider came into view. Whether it was the same or a different individual that came into view in the third step had a significant effect on how close together or far apart the spiders positioned themselves. It has been demonstrated before that salticids can discriminate between familiar and unfamiliar individuals while relying on chemical cues, but this new research on P. regius provides the first experimental evidence that a spider can discriminate by sight between familiar and unfamiliar individuals.

      Clark RJ, Jackson RR (1995) Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology and Evolution 7:185-190

      Strengths:

      This work is a useful step toward a fuller understanding of the perceptual and cognitive capacities of spiders and other animals with small nervous systems. By providing experimental evidence for a conclusion that a spider can, by sight, discriminate between familiar and unfamiliar individuals, this research will be an important milestone. We can anticipate a substantial influence on future research.

      Weaknesses:

      (1) The conclusions should be stated more carefully.

      (2) It is not clearly the case that the experimental methods are based on 'habituation (learning to ignore; learning not to respond). Saying 'habituation' seems to imply that certain distances are instances of responding and other distances are instances of not responding but, as a reasonable alternative, we might call distance in all instances a response. However, whether all distances are responses or not is a distracting issue because being based on habituation is not a necessity.

      (3) Besides data related to distances, other data might have been useful. For example, salticids are especially well known for the way they communicate using distinctive visual displays and, unlike distance, displaying is a discrete, unambiguous response.

      (4) Methods more aligned with salticids having extraordinarily good eyesight would have useful. For example, with salticids, standardising and manipulating stimuli in experiments can be achieved by using mounts, video playback and computer-generated animation.

      (5) An asocial-versus-social distinction is too imprecise, and it may have been emphasised too much. With P. regius, irrespective of whether we use the label asocial or social, the important question pertains to the frequency of encounters between the same individuals and the consequences of these encounters.

      (6) Hypotheses related to not-so-strictly adaptive factors are discussed and these hypotheses are interesting, but these considerations are not necessarily incompatible with more strictly adaptive influences being relevant as well.

      Comments on Revision:

      The authors have responded reasonably to the comments I made. There is nothing else that I wish to add.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper sets out to examine the social recognition abilities of a 'solitary' jumping spider species. It demonstrates that based on vision alone spiders can habituate and dishabituate to the presence of conspecifics. The data support the interpretation that these spiders can distinguish between conspecifics on the basis of their appearance.

      We appreciate the reviewer’s summary. We indeed aimed at investigating the social recognition abilities of the solitary jumping spider (Phidippus regius), using visual cues alone. By employing a habituation-dishabituation paradigm, well-established in developmental psychology, we found support for the interpretation that these spiders can distinguish between conspecifics based on their appearance, as the reviewer noted.

      Strengths:

      The study presents two experiments. The second set of data recapitulates the findings of the first experiment with an independent set of spiders, highlighting the strength of the results. The study also uses a highly quantitative approach to measuring relative interest between pairs of spiders based on their distance.

      We appreciate the reviewer's acknowledgement of the strengths of our study. The second set of data underscores the robustness and reliability of the results. Additionally, however, the second experiment served the purpose of disentangling whether the habituation effect observed over sessions was caused by ‘physical’ or ‘cognitive’ fatigue by employing ‘long-term’ dishabituation trials at the end of Session 3. These trials are critical in our study as they help to differentiate between recognition of individual identities versus recognition of familiar individuals (as opposed to unfamiliar ones) and to determine if the observed effects are due to ‘general habituation’ or ‘specific recognition’. We will elaborate on this further below in this revision.

      As stated by the reviewer, we employed a highly quantitative approach to measure relative interest between pairs of spiders based on their distance, providing precise and objective data to support our conclusions.

      Weaknesses:

      The study design is overly complicated, missing key controls, and the data presented in the figures are not clearly connected to the study. The discussion is challenging to understand and appears to make unsupported conclusions.

      While we acknowledge that the study design is indeed complex, this complexity is essential for conducting a well-controlled and balanced experiment regarding the experimental conditions.  

      The habituation-dishabituation paradigm is a well-established paradigm in developmental psychology with non-verbal infants. It is understood that during the habituation phase, an individual's attention to a repeated stimulus decreases as they engage in information processing and form a mental representation of it. As the stimulus becomes familiar, it loses its novelty and interest. When a new stimulus is introduced, a recovery of attention suggests that the individual has compared this new stimulus to the stored memory of the habituation stimulus and detected a difference. This process suggests that the individual not only remembered the original stimulus but also recognized the new one as distinct (for a review Kavšek & Bornstein, 2010).

      This paradigm has also been extensively applied in animal research, where, like infants, nonverbal subjects rely on recognition and discrimination processes to demonstrate their cognitive abilities. The use of this paradigm dates back to seminal studies such as Humphrey (1974), which explored the perceptual world of monkeys, illustrating how species and individuals are perceived and recognized. In another previous study (Dahl, Logothetis, and Hoffman, 2007), we utilized an even more complex experimental design that incorporated dedicated baseline trials for both habituation and dishabituation phases, which was well-received despite its complexity. In the current study, we contrast dishabituation and habituation trials directly, creating a sequential cascade where each trial is evaluated against the preceding one as its baseline.

      On the basis of these arguments, we respectfully decline the claim that this paradigm is inappropriate or lacks key controls. Our study design, though complex, is rigorously grounded in established methodologies and offers a robust framework for exploring individual recognition in Phidippus regius.

      However, we take the reviewer’s comments seriously and are committed to identifying and addressing the aspects in our manuscript that may have led to misunderstandings. We clarify these areas in our revision of the manuscript. Modifications were made in the Introduction, Methods, and Discussion sections.

      Dahl, C. D., Logothetis, N. K., & Hoffman, K. L. (2007). Individuation and holistic processing of faces in rhesus monkeys. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2069-2076.

      Humphrey, N. K. (1974). Species and individuals in the perceptual world of monkeys. Perception, 3(1), 105-114.

      Kavšek, M., & Bornstein, M. H. (2010). Visual habituation and dishabituation in preterm infants: A review and meta-analysis. Research in developmental disabilities, 31(5), 951-975.

      (1) Study design: The study design is rather complicated and as a result, it is difficult to interpret the results. The spiders are presented with the same individual twice in a row, called a habituation trial. Then a new individual is presented twice in a row. The first of these is a dishabituation trial and the second is another habituation trial (but now habituating to a second individual). This is done with three pairings and then this entire structure is repeated over three sessions. 

      While we acknowledge that the design is complex, this complexity is essential for conducting a well-controlled experiment, as described earlier. As the reviewer noted, our design involves presenting the same individual to the focal spider twice in a row (habituation trial), followed by a new individual (dishabituation trial), and then repeating this structure. This approach is fundamental to the habituation-dishabituation paradigm, which allows us to systematically compare the responses to a familiar individual with those elicited by a novel one. If the spiders exhibit different behaviours in terms of the distance they maintain when encountering the same individual versus a new one, it indicates that they are processing the stimuli differently, consistent with recognition memory. This differential response is a key indicator that the spiders can distinguish between familiar and unfamiliar individuals, demonstrating not only a decrease in interest or engagement due to repeated exposure but also a cognitive process where the lack of a matching memory template triggers a distinct behavioural response when confronted with novel stimuli.

      By repeating this sequence two more times (Session 2 and 3), we aim to assess the consistency of this recognition process over time. If the focal spider does not remember the individuals from the previous session (one hour ago), we expect consistent behavioural responses across sessions. Conversely, if there is a decrease in response magnitude but the overall response patterns are maintained, we can infer that the focal spider recognizes the previously presented individuals and exhibits habituation, reflected in reduced response intensity. In other words, over sessions and repeated exposure to the same individuals, the memory traces become more firmly established, leading to a situation where a dishabituation trial introduces less novelty, as the spider's recognition of previously encountered individuals becomes more robust and consistent to the point where “habituation” and “dishabituation” trials become indistinguishable, as observed in Session 3. This method allows us to assess the duration of identity recognition in these spiders, indicating how long the memory of specific individuals persists. 

      All of these outcomes were anticipated before we began Experiment 1. Given that the results aligned with our predictions, we then sought to determine whether the observed reduction in the magnitude of the effect (i.e., the difference between habituation and dishabituation trials) was due to a physical fatigue effect, where the spiders might simply be getting tired, or a cognitive fatigue effect, where the spiders recognized the individuals and as a result did not exhibit any novelty response. To address this, we replicated the experiment with a new group of spiders and introduced special (long-term dishabituation) trials at the end, where the focal spider was presented with a novel spider. 

      These extra trials allowed us to disentangle the nature of the diminishing response across repeated sessions: a lack of dishabituation (remaining distant) would suggest general physical fatigue, whereas a strong dishabituation response (approaching closely) to the novel spider would indicate cognitive fatigue, thereby confirming that the spiders were indeed recognizing the familiar individuals throughout the experiment. 

      In light of these considerations, we believe that the complexity of our design is not only justified but absolutely necessary to rigorously test the cognitive capabilities of the spiders. Nonetheless, we understand the need for clarity in presenting our findings and are committed to refining our manuscript to better communicate the rationale and results of our study.

      The data appear to show the strong effects of differences between habituation and dishabituation trials in the first session. The decrease in differential behavior between the socalled habituation and dishabituation trials in sessions 2 and 3 is explained as a consequence of the spiders beginning to habituate in general to all of the individuals. 

      The key question, as mentioned above, is to determine the underlying cause of this general habituation across sessions. Specifically, we aim to differentiate between two potential causes: physical fatigue, where the spiders may simply become less responsive due to the demands of the three-hour testing period, or cognitive fatigue, where the repeated exposure to the same individuals leads to a decreased response because the spiders have started to recognize these individuals over multiple repetitions.

      To address this, we replicated the experiment and introduced each focal spider to a new individual in what we termed "long-term dishabituation" trials. By comparing the spiders' responses to these novel individuals with their responses in earlier trials, we sought to better understand the underlying mechanisms of habituation and the duration of individual recognition. The strong dishabituation response observed in these trials is indicative of cognitive fatigue, supporting the presence of recognition memory rather than a general physical fatigue effect.

      The claim that the spiders remember specific individuals is somewhat undercut because all of the 'dishabituation' trials in session 2 are toward spiders they already met for 14 minutes previously but seemingly do not remember in session 2. 

      We appreciate the reviewer’s comment regarding the claim that spiders do not remember specific individuals. This assessment does not align with the rationale of our experiment. The reviewer noted that the dishabituation trials in session 2 involved spiders previously encountered and suggested that the lack of a clear memory response might undercut the claim of specific individual recognition. 

      However, as we explained earlier, we expect habituation in Session 2 relative to Session 1 precisely because spiders recognize each other in Session 2. If there were no such habituation in Sessions 2 or 3, it would suggest that the spiders’ recognition memory does not persist beyond one hour. 

      Additionally, it is important to correct the timing noted by the reviewer: each individual spider reencounters the same spider exactly one hour later, not 14 minutes. This is detailed in Table 2 of the manuscript, which outlines that each trial lasts 7 minutes, with a 3-minute visual separation between trials. With six trials per session, this totals to 1 hour per session. Thus, every pair of spiders re-encounters exactly 1 hour after their last interaction.

      Again, it is important to clarify that the observed decrease in differential behaviour is not indicative of a failure to remember specific individuals. Rather, it reflects a systematic pattern of habituation, which is a common and expected outcome in such paradigms. This systematic decrease in response strength suggests that the spiders recognize the previously encountered individuals and becoming less responsive over repeated exposures, consistent with the process of habituation. In different terms, the repeated exposure to the same individuals leads to more firmly established memory traces, leading to a situation where a dishabituation trial introduces less novelty, as the spider's recognition of previously encountered individuals becomes more robust and consistent.

      Based on the explanations provided above, we respectfully reject the claim that “the spiders remember specific individuals is somewhat undercut […]”. In contrast, this claim is incorrect, as the exact opposite is true. The very strength of our study lies in demonstrating that spiders possess robust recognition memory, as evidenced by a clear dissociation of habituation and dishabituation trials in Session 1, followed by a gradually diminishing effect over Session 2 and 3 as the spiders are increased exposed to the same individuals: Furthermore, the strong rebound from habituation observed in long-term dishabituation trials, where the spiders were exposed to novel individuals. 

      This misunderstanding suggests that we should take additional care in the revised manuscript to clarify our explanations and provide more detail, ensuring that the rationale behind our experimental design and findings are communicated effectively.

      In session 3 it is ambiguous what is happening because the spiders no longer differentiate between the trial types. This could be due to fatigue or familiarity. 

      The reviewer proposes that the absence of differentiation between 'habituation' and 'dishabituation' trials in Session 3 might be attributed to either fatigue or familiarity. We interpret "fatigue" as what we have termed the “physical fatigue effect” and "familiarity" as “cognitive fatigue effect.” In this context, we concur with the reviewer’s observation, and this very line of reasoning prompted us to conduct a further experiment following the outcome of Experiment 1.

      A second experiment is done to show that introducing a totally novel individual, recovers a large dishabituation response, suggesting that the lack of differences between 'habituation' and 'dishabituation' trials in session 3 is the result of general habituation to all of the spiders in the session rather than fatigue. As mentioned before, these data do support the claim that spiders differentiate among individuals.

      As the reviewer rightly noted, we addressed these possibilities in our second experiment by introducing a completely novel individual to the spiders, which resulted in a strong dishabituation response. This outcome suggests that the lack of differentiation in Session 3 is more likely due to cognitive habituation rather than physical fatigue. The robust response to novel individuals demonstrates that the spiders are capable of distinguishing between familiar and unfamiliar individuals, suggesting that the reduced differentiation is a consequence of habituation from repeated encounters with the same individuals. 

      We appreciate the reviewer's recognition that these findings support the conclusion that spiders are capable of differentiating between individual conspecifics.

      Additionally, it is important to clarify the structure of our sessions. Each of the 6 trials lasts 7 minutes with a 3-minute visual separation, resulting in a total of 1 hour per session. This ensures that each pair of spiders is encountered exactly one hour later, which controls for the timing and allows us to evaluate the spiders' recognition memory over repeated sessions.

      In summary, while the data show a decrease in differential behaviour between habituation and dishabituation trials in Session 2 and 3, the results from our second experiment support the interpretation that this is due to ‘cognitive habituation’ (familiarization) rather than ‘physical fatigue’ (general habituation). This habituation effect underscores the spiders' ability to recognize and become familiar with specific individuals over time, reinforcing our conclusion that they can differentiate among individuals.

      The data from session 1 are easy to interpret. The data from sessions 2 and 3 are harder to understand, but these are the trials in which they meet an individual again after a substantial period of separation. 

      The data from Session 1 are straightforward to interpret, showing clear differences between habituation and dishabituation trials. However, the data from Sessions 2 and 3 are more complex, as these sessions involve the spiders re-encounter individuals after a 1-hour period of separation. Importantly, the outcome is not an artefact in our experiment, but the consequence of a deliberate choice in the experimental design to assess whether spiders can recognise each other after this duration. We believe that this complexity aligns with our expectations, based on the assumption that spiders can recognise each other after one hour. The observed pattern of habituation in Sessions 2 and 3 suggests that the spiders retain memory of the individuals, leading to decreased responsiveness upon repeated encounters. This interpretation is further supported by the Experiment 2, which introduced a novel individual and elicited a strong dishabituation response. This finding confirms that the reduced differentiation in later sessions is due to cognitive habituation rather than physical fatigue, supporting the conclusion that recognition memory last at least one hour.

      We hope this explanation clarifies our findings and the rationale behind our relatively complex experimental design choice. 

      Other studies looking at recognition in ants and wasps (cited by the authors) have done a 4 trial design in which focal animal A meets B in the first trial, then meets C in the second trial, meets B again in the third trial, and then meets D in the last trial. In that scenario trials 1, 2, and 4 are between unfamiliar individuals and trial 3 is between potentially familiar individuals. In both the ants and wasps, high aggression is seen in species with and without recognition on trial 1, with low aggression specifically for trials with familiar individuals in species with recognition. Across different tests, species or populations that lack recognition have shown a general reduction in aggression towards all individuals that become progressively less aggressive over time (reminiscent of the session 2 and 3 data) while others have maintained modest levels of aggression across all individuals. The 4 session design used in those other studies provides an unambiguous interpretation of the data while controlling for 'fatigue'. 

      We acknowledge that there are multiple ways to design experiments to test recognition memory. In fact, we considered using the paradigm similar to the one proposed by the reviewer and used in studies like Dreier et al., which involves a series of trials with unfamiliar and familiar individuals over extended intervals. We then, however, opted for a more complex design to rigorously assess how habituation and recognition memory develop over repeated sessions with shorter intervals.

      In the following, we would like to describe the advantages and disadvantages of both paradigms and outline how we ended up using the more complex version:

      Advantages of our paradigm: 

      As pointed out, by repeating the sequence in exactly similar manner (every same pair of spiders reoccurs after exactly 1 and 2 hours), we can comprehensively evaluate the effect of habituation over multiple exposures. This allows us to assess the extent of the spiders’ memory, when a spider shows stronger habituation to individuals that were novel in Session 1 but “familiar” by the time they encounter them again in Session 2. To achieve this, we need to ensure that each trial and visual separation is precisely timed, ensuring consistent intervals between encounters. As a consequence, each individual spider undergoes the exact same experimental protocol. Most critically, however, are the novel individuals presented after Session 3 (long-term dishabituation trials) that help differentiate between cognitive habituation and physical fatigue.  Disadvantages of our paradigm:

      The sequences of habituation and dishabituation trials may make the design more complex, as pointed out by the reviewer. As a consequence, the interpretation will become more difficult. However, the data perfectly align with our predictions, and the outcomes were as anticipated in two independently run experiments with two groups of spiders. This highlights the reliability of our experimental design and robustness of our findings.

      Advantages of the 4-trial paradigm proposed by the reviewer:

      Clearly, the structure of the proposed design is simpler, making interpretation easier. The paradigm also accommodates longer intervals between trials (e.g., 24 hours). Longer intervals could theoretically have been applied in our study. (However, we chose not to leave the spiders in the experimental box longer than necessary, opting instead to return them to their home containers for the night to ensure their well-being. And, a 24-hour interval targets a different phase in the process of long-term memory, but more to this topic further below.)

      Disadvantages of the 4-trial paradigm proposed by the reviewer:

      Strictly replicating the 4-trial design would result in one familiar encounter versus three unfamiliar ones. This imbalance might introduce bias and limit the robustness of the measurements. Additionally, the design provides less data overall, as the focal individual will be confronted with three other individuals, who will then be excluded from further testing as focal subjects themselves. In contrast, our design ensures a balanced number of familiar0020(habituation) and novel encounters (dishabituation) for each focal individual, allowing for more efficient and comprehensive data collection without excluding individuals from further testing.

      Given the aforementioned considerations, we determined that the advantages of our experimental design, in particular the assessment of a cognitive fatigue effect when encountering the same individuals again, outweigh those of the proposed 4-trial design. The mentioned limitations of the 4-trial design, such as the potential for bias and less comprehensive data collection, do not justify re-running the study, especially when the best case scenario is fewer insights than our already existing findings. Our current paradigm yielded results that align perfectly with our predictions, offering a thorough and reliable understanding of recognition memory and habituation in spiders. Therefore, we believe our approach provides a more complete and robust answer to our research questions.

      However, we acknowledge that there might be insufficient information in the manuscript addressing the rationale behind our design choices, and we will revise the manuscript to provide a clearer explanation of why our approach is well suited to answering the research questions at hand.

      That all trials in sessions 2 and 3 are always with familiar individuals makes it challenging to understand how much the spiders are habituating to each other versus having some kind of associative learning of individual identity and behavior.

      We understand the reviewer's concern that having all trials in Sessions 2 and 3 involve familiar individuals could make it challenging to distinguish between general habituation and associative learning of individual identities. In our study, we contrast habituation and dishabituation trials: If general habituation were occurring, we would expect uniformly reduced responses (around the zero line) to all individuals over time, indicating that the spiders are getting used to any individual regardless of their specific identity. However, this is not the case. Our data show that while the responses in Session 2 are reduced in effect size compared to Session 1, they are not flat (around the zero line). This indicates that the spiders still differentiate between a repetition of a spider identity (habituation trials) and two different spider identities (dishabituation trials), albeit with a reduced response strength. The systematicity in the data suggests that the spiders are not merely habituating to any individual, but are instead retaining some level of recognition between specific individuals.

      Only by Session 3 do the spiders fully habituate to the point where the responses to habituation and dishabituation trials converge, indicating a complete habituation effect. The introduction of novel individuals in our long-term dishabituation trials further supports the idea that the spiders are recognizing specific individuals rather than exhibiting general habituation. If the spiders were experiencing general habituation, we would not expect the strong dishabituation response observed in our study.

      The data presentation is also very complicated. How is it the case that a negative proportion of time is spent? The methods reveal that this metric is derived by comparing the time individuals spent in each region relative to the previous time they saw that individual. 

      We understand the reviewer's concern regarding the complexity of the data presentation and the calculation of the negative proportion of time. Regarding the complexity of the design, we have already justified our choice of a more intricate experimental setup. This complexity is necessary for accurately assessing recognition memory and habituation over repeated sessions. 

      The metric is derived by comparing the time individuals spent in each region (relative to the transparent front panel) in the current trial (n) relative to the previous trial (n-1). With multiple trials, this results in a cascade of trials and conditions. This method was established in

      Humphrey’s and our previous study (Humphrey, 1974; Dahl, Logothetis, Hoffman, 2007), where we demonstrated its effectiveness in assessing individuation of faces in macaque monkeys.  

      Also in our current experimental design, each current trial is contrasted with the preceding one, allowing us to compare distributions of distances taken in two trials. In this context, every preceding trial serves as baseline for every current trial. 

      Figure 1 of the manuscript, illustrates the structure and analysis of the trials,

      Panel a depicts the baseline, habituation, and dishabituation trials, where spiders are exposed to different conspecifics.

      Baseline (left panel, red): When two spiders are visually exposed to each other for the first time, it is expected that they will explore each other closely, exhibiting high levels of proximity (initial exploratory behaviour).

      Habituation (centre panel, green): When the same spiders are reintroduced in a subsequent round of exposure, it is anticipated that they will exhibit reduced exploratory behaviour and maintain a greater distance compared to the baseline trial, if they recognize each other from the previous encounter (indicative of habituation).

      Panel b (upper and middle panels; red and green): Demonstrates the theoretical assumptions and expected changes in behaviour:

      By subtracting the distribution of distances in the baseline trial from the habituation trial, we generate a delta distribution. This delta distribution reveals negative values near the transparent panel (indicating reduced proximity in the habituation trial) and positive values at mid- to fardistances (indicating increased distancing behaviour). This delta distribution is also what is reported in Figure 2. 

      Dishabituation: In this trial, a new spider (different from the one in the habituation trial) is introduced. The dishabituation trial will be considered in contrast to the habituation trial described above. If the spider recognizes the new individual as different, it is expected to show increased exploratory behaviour and reduced distance, similar to the initial baseline trial.

      By subtracting the distribution of distances in the habituation trial from the dishabituation trial, we obtain another delta distribution. This delta distribution should reveal positive values near the transparent panel (indicating increased proximity in the dishabituation trial) and negative values at mid- to far-distances (indicating decreased proximity compared to the habituation trial).

      We hope this clarifies the rationale behind our data presentation and the methodological approach we employed. We have revised the figure to enhance its clarity and make it more intuitive for the reader.

      Dahl, C. D., Logothetis, N. K., & Hoffman, K. L. (2007). Individuation and holistic processing of faces in rhesus monkeys. Proceedings of the Royal Society B: Biological Sciences, 274(1622), 2069-2076.

      Humphrey, N. K. (1974). Species and individuals in the perceptual world of monkeys. Perception, 3(1), 105-114.

      At the very least, data showing the distribution of distances from the wall would be much easier to interpret for the reader.

      We understand the reviewer's concern that data showing the distribution of distances from the wall would be much easier to interpret for the reader. We initially consider that but came to the conclusion that this approach is not straightforward. For instance, if both spiders are positioned at the very front but in different corners, the distance to the panel would be very small, but the distance between the spiders would be large. Thus, using distances from the wall could misrepresent the actual spatial distribution between the spiders.

      (2) "Long-term social memory": It is not entirely clear what is meant by the authors when they say 'long-term social memory', though typically long-term memory refers to a form of a memory that requires protein synthesis.  

      To address this conceptually, we used the term "long-term social memory" to describe the spiders' ability to recognize and remember individual conspecifics over multiple experimental sessions. While social memory refers to the ability of an individual to recognize other individuals within a social context, long-term memory typically involves the retention of information over extended periods. Recognizing that the term “long-term social memory” is not commonly used, we have revised the manuscript to use the more standard term “long-term memory.”

      While the precise timing of memory formation varies across species and contexts, a general rule is that long-term memory should last for > 24 hours (e.g., Dreier et al 2007 Biol Letters). The longest time that spiders are apart in this trial setup is something like an hour. There is no basis to claim that spiders have long-term social memory as they are never asked to remember anyone after a long time apart.

      We appreciate the reviewer’s feedback regarding the term "long-term social memory." The statement "long-term memory should last for > 24 hours" is a generalisation in discussions about memory. It oversimplifies a more complex topic. That is, long-term memory is typically distinguished from short-term memory by its persistence over time, often lasting from hours to a lifetime. However, the exact duration that qualifies memory as "long-term" varies depending on the context, model species, and type of memory. In studies involved in synaptic plasticity (LTP), the object might indeed be to look at memory that persists for at least 24 hours as a criterion for long-term memory. In studies of cellular and/or molecular mechanisms where the stabilization and consolidation of memory traces over time are key areas of interest this 24-hour interval is very common. But, defining long-term memory strictly by a 24-hour duration is by no means universally accepted nor does it apply across all fields of study.

      To clarify, long-term memory is a process involving consolidation starting within minutes to hours after learning. Clearly, full consolidation can take longer, while memory persisting 24 hours is considered fully consolidated. But this does not mean that memory lasting less than 24 hours are not part of long-term memory. 

      In fact, Atkinson and Shiffrin (1969) proposed that information entering short-term memory remains there for about 20 to 30 seconds before being displaced due to space limitations. During this brief interval, initial encoding processes begin transferring information to long-term memory, establishing an initial memory trace. This transfer is not indicative of full consolidation but represents the initial "laying down" of the memory trace (encoding). In our study, the focal spider’s brain forms initial memory traces of the individuals it encounters. This process continues during the period of visual separation. Upon re-encountering the same individual a few minutes later, the spider accesses the initial memory trace stored in long-term memory. This trace is fragile and not fully consolidated. The re-encounter acts as a rehearsal, reactivating specific memory traces and potentially strengthening them through additional encoding processes, allowing the spider to recognize the individual even an hour later.

      According to Markowitsch (2013), initial encoding in long-term memory begins within seconds to minutes. It is also important to note that we argue for identity recognition rather than identity recall. Recognition involves correctly identifying a stimulus when it is presented again, while recall requires the volitional generation of information without an external stimulus. Thus, recall may rely on deeper forms of memory consolidation than recognition.

      Is protein synthesis required for long-term memory? 

      The role of protein synthesis in long-term memory has been extensively studied. According to Castellucci et al. (1978), explicit memory comprises a short-term phase that does not require protein synthesis and a long-term phase that does. Hebbian learning in its initial phase (early LTP) does not necessarily require protein synthesis. This phase involves the rapid strengthening of synapses through existing proteins and signaling pathways, such as the activation of NMDA receptors and the influx of Ca2+ ions. For the changes to persist (late LTP), protein synthesis is important. This phase involves the production of new proteins that contribute to long-term structural changes at the synapse, such as the growth of new synaptic connections or the stabilization of existing ones.

      This differentiation between the early and late phases of LTP highlights that long-term memory can begin forming without immediate protein synthesis. Our study focuses on this early phase of memory encoding, which involves the initial formation of memory traces that do not yet depend on protein synthesis. 

      It is however worth noting that recent research suggests that there is an early phase of protein synthesis (within minutes to hours) through the activation of immediate early genes (IEGs) and transcription factors. In this context, protein synthesis supports initial synaptic modifications. What the reviewer refers to is the consolidation phase (late phase), where continued synthesis of proteins induces structural changes at synapses, leading to the formation of new synaptic connections. In our study, it is plausible to assume that an early form of protein synthesis may contribute to stabilizing the initial memory traces during the encoding phase. However, whether or not protein synthesis occurred in our spiders is beyond the scope of this investigation and was not specifically addressed.

      The critical aspect of our study is that the information transitioned from short-term memory to long-term memory during an early encoding phase, allowing recall after an hour. Due to the inherent limitations and transient nature of the short-term memory, it is implausible for spiders to retain these memory representations solely within the short-term memory for such durations. Our findings suggest that the initial encoding processes were robust enough to transfer these experiences into long-term memory, where they were stabilized and could be accessed later. 

      In sum, it is important to note that long-term memory is a dynamic process, and while testing after 24 hours is a convention in some studies, this timing is arbitrary and not universally applicable to all contexts or species. The more critical consideration here is that we are dealing with a species where no prior evidence of long-term memory exists. Debating a 24-hour delay or the specifics of protein synthesis, while potentially interesting for future studies, detracts from the true significance of our findings. Our study is the first to show something akin to long-term memory representations in this species and this should remain in our focus.

      Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and retrieval processes in long-term memory. Psychological review, 76(2), 179. 

      Markowitsch, H. J. (2013). Memory and self–Neuroscientific landscapes. International Scholarly Research Notices, 2013(1), 176027.

      Castellucci, V. F., Carew, T. J., & Kandel, E. R., 1978. Cellular analysis of long-term habituation of the gill-withdrawal reflex of Aplysia californica. Science, 202(4374), 1306-1308.

      The odd phrasing of the 'long-term dishabutation' trial makes it seem that it is testing a longterm memory, but it is not. The spiders have never met. The fact that they are very habituated to one set of stimuli and then respond to a new stimulus is not evidence of long-term memory. To clearly test memory (which is the part really lacking from the design), the authors would need to show that spiders - upon the first instance of re-encountering a previously encountered individual are already 'habituated' to them but not to some other individuals. The current data suggest this may be the case, but it is just very hard to interpret given the design does not directly test the memory of individuals in a clear and unambiguous manner.

      While we appreciate the reviewer's feedback, we believe there may have been some misunderstanding regarding the term “long-term dishabituation.” The introduction of novel individuals at the end of Session 3 was not intended to test long-term memory by having spiders recognize these novel individuals. Instead, it aimed to investigate the nature of the habituation observed over the three sessions.

      The novel individuals introduced at the end of Session 3 serve the purpose to differentiate between general habituation (a decline in response due to repeated exposure to any stimuli) and specific habituation (recognition and reduced response to previously encountered individuals). The novel spiders have never been encountered before, so the focal spiders cannot have prior representations of them. Thus, the strong dishabituation response to these novel individuals indicates that the habituation observed earlier is not due to a general fatigue effect or loss of interest but rather a specific habituation effect to the familiar individuals. By showing such strong and increased response to novel individuals, the study demonstrates that the spiders' increasingly reduced responses in Sessions 2 and 3 are not merely due to a general decrease in responsiveness but suggest cognitive habituation. This cognitive habituation implies that the spiders remember the familiar individuals (as each of them occurred three times across the three sessions), a process that relies on long-term memory. Therefore, while the novel spiders themselves are not a direct test of long-term memory, the use of these novel spiders helps us infer that the habituation observed over the three sessions is indeed due to the formation of long-term memory traces.

      In other words, the organism detects and processes the novel stimulus as different from the habituated one. In our study, if a spider showed a strong dishabituation response to a novel individual introduced at the end of Session 3, it would indicate that the spider had formed specific representations of the individuals they encountered during the three sessions. These representations allow the spiders to recognise the novel individuals as different, leading to renewed interest and a stronger behavioural response. It is the absence of a prior representation for the novel spiders that triggers this dishabituation response. Since the novel spider does not match any stored representations of the previously encountered spiders, the focal spider responds more strongly.

      The introduction of novel individuals at the end of Session 3 helps clarify that the increasing habituation observed in Session 2 and 3 is specific to familiar individuals, indicating cognitive habituation. This supports the presence of long-term memory processes in the spiders, as they can distinguish between previously encountered individuals and new ones. The habituationdishabituation paradigm thus effectively demonstrates the spiders' ability to form and reactivate encoded memory traces, providing clear evidence of recognition memory. 

      For these reasons, we are convinced that our interpretation is accurate and hope this clarification renders the additional request for an entirely new experiment unnecessary.

      (3) Lack of a functional explanation and the emphasis on 'asociality': It is entirely plausible that recognition is a pleitropic byproduct of the overall visual cognition abilities in the spiders. 

      We agree with the reviewer that it is essential to consider the broader context of individual recognition and its potential adaptive significance. The possibility that recognition in jumping spiders could be a pleiotropic byproduct of their advanced visual cognition abilities is indeed a plausible explanation and has been discussed in our manuscript.

      However, the discussion that discounts territoriality as a potential explanation is not well laid out. First, many species that are 'asocial' nevertheless defend territories. It is perhaps best to say such species are not group living, but they have social lives because they encounter conspecifics and need to interact with them.

      The reviewer also correctly points out that many 'asocial' species still defend territories and have social interactions. Our use of the term 'asocial' was meant to indicate that jumping spiders do not live in cohesive social groups, but we acknowledge that they do have social lives in terms of interactions with conspecifics. It is more accurate to describe these spiders as non-groupliving, yet socially interactive species. A better term is “non-social” to refer to the jumping spider as a species that do not live in stable social groups and do not exhibit associated behaviours, such as cooperative behaviours. This also would imply that individuals still interact with conspecifics, especially in contexts like mating, territorial disputes or aggression. We, thus, change the term from “asocial” to “non-social” in the manuscript.  

      Indeed, there are many examples of solitary living species that show the dear enemy effect, a form of individual recognition, towards familiar territorial neighbors. The authors in this case note that territorial competition is mediated by the size or color of the chelicerae (seemingly a trait that could be used to distinguish among individuals). Apparently, because previous work has suggested that territorial disputes can be mediated by a trait in the absence of familiarity has led them to discount the possibility that keeping track of the local neighbors in a potentially cannibalistic species could be a sufficient functional reason. In any event, the current evidence presented certainly does not warrant discounting that hypothesis.

      The “dear enemy effect”, where solitary living species recognize and show reduced aggression towards familiar territorial neighbors, is a relevant consideration. This effect demonstrates that individual recognition can have significant functional implications even in species that are not group-living. We will elaborate on this effect in the revised manuscript to provide a more comprehensive discussion.

      The reviewer mentioned that territorial disputes can be mediated by the size or color of the chelicerae, potentially serving as a feature for individual recognition. Our intention was not to discount the role of such traits but to highlight that the level of identity recognition we observed represents subordinate classification. This is different from the basic-level classification, such as distinguishing between male and female based on chelicerae colour. While we acknowledge that colour can be an important feature for identity discrimination, our findings suggest that individual recognition in jumping spiders goes beyond simple colour differentiation. 

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated whether a salticid spider, Phidippus regius, recognizes other individuals of the same species. The authors placed each spider inside a container from which it could see another spider for 7 minutes, before having its view of the other spider occluded by an opaque barrier for 3 minutes. The spider was then either presented with the same individual again (habituation trial) or a different individual (dishabituation trial). The authors recorded the distance between the two spiders during each trial. In habituation trials, the spiders were predicted to spend more time further away from each other and, in dishabituation trials, the spiders were predicted to spend more time closer to each other. The results followed these predictions, and the authors then considered whether the spiders in habituation trials were generally fatigued instead of being habituated to the appearance of the other spider, which may have explained why they spent less time near the other individual. The authors presented the spiders with a different (novel) individual after a longer period of time (which they considered to be a long-term dishabituation trial), and found that the spiders switched to spending more time closer to the other individual again during this trial. This suggested that the spiders had recognized and had habituated to the individual that they had seen before and that they became dishabituated when they encountered a different individual.

      We appreciate the reviewer's detailed summary of our study. The reviewer's summary accurately captures the essence of our experimental design, predictions, and findings.

      Strengths:

      It is interesting to consider individual recognition by Phidippus regius. Other work on individual recognition by an invertebrate has been, for instance, known for a species of social wasp, but Phidippus regius is a different animal. Importantly and more specifically, P. regius is a salticid spider, and these spiders are known to have exceptional eyesight for animals of their size, potentially making them especially suitable for studies on individual recognition. In the current study, the results from experiments were consistent with the authors' predictions, suggesting that the spiders were recognizing each other by being habituated to individuals they had encountered before and by being dishabituated to individuals they had not encountered before. This is a good start in considering individual recognition by this species.

      We appreciate the reviewer's positive summary and acknowledgment of the strengths of our study. We would like to point out some more details: 

      While the exceptional eyesight of salticid spiders is indeed a significant factor, our study reaches deeper in terms of processing. We do not argue at the level of sensation rather than at the level of perception. Even more, identity recognition is a higher-level perceptual process. This distinction is crucial: we are not merely examining the spiders' sensory capabilities (such as good eye sight), but rather how their brains interpret and represent what they “see”. This involves a cognitive process where the sensory input (sensation) is processed and integrated into meaningful constructs (perception) and memorised in form of representations. 

      Our study also suggests that P. regius engages in “higher-level” perceptual processes. This most-likely involves complex representations of individual conspecifics, which in mammalian brains are associated with regions such as the central inferior temporal (cIT) and anterior inferior temporal (aIT) areas. We provide evidence that these spiders do not just sense visual stimuli but interpret and recognize individual identities, indicating sophisticated perceptual and cognitive abilities. In other words, the spiders do not merely respond to visual stimuli in a reflexive manner, but rather engage in sophisticated perceptual and cognitive processes that allow them to recognize and distinguish between individual identities. This indicates that the spiders are not simple Braitenberg vehicles reacting to stimuli, but are thinking organisms capable of complex mental representations. This resonates with current trends in animal cognition research, which increasingly recognize some level of consciousness and advanced cognitive abilities across a wide range of animal species. Moreover, this aligns with the growing interest and recognition of spider cognition, where research begins to provide evidence for the cognitive complexity and perceptual capabilities of these often underestimated creatures (Jackson and Cross, 2011). 

      Jackson, R. R., & Cross, F. R. (2011). Spider cognition. Advances in insect physiology, 41, 115174.

      Weaknesses:

      The experiments in this manuscript (habituation/dishabituation trials) are a good start for considering whether individuals of a salticid species recognize each other. I am left wondering, however, what features the spiders were specifically paying attention to when recognizing each other. The authors cited Sheehan and Tibbetts (2010) who stated that "Individual recognition requires individuals to uniquely identify their social partners based on phenotypic variation." Also, recognition was considered in a paper on another salticid by Tedore and Johnsen (2013).

      Tedore, C., & Johnsen, S. (2013). Pheromones exert top-down effects on visual recognition in the jumping spider Lyssomanes viridis. The Journal of Experimental Biology, 216, 1744-1756. doi: 10.1242/jeb.071118 

      In this elegant study, the authors presented spiders with manipulated images to find out what features matter to these spiders when recognizing individuals.

      The reviewer raises an important point regarding the specific features that Phidippus regius might be paying attention to when recognizing individual conspecifics. Our study indeed cited Sheehan and Tibbetts (2010) to highlight the importance of phenotypic variation in individual recognition. Additionally, we referenced the work by Tedore and Johnsen (2013) on visual recognition in another salticid species, which suggests that multiple sensory modalities, including visual and pheromonal cues, may be involved in the recognition process. While our current study focused on demonstrating that Phidippus regius can recognize individual conspecifics, we acknowledge that it does not specifically identify the phenotypic features involved in this recognition. 

      Part of the problem with using two living individuals in experiments is that the behavior of one individual can influence the behavior of the other, and this can bias the results.  

      We appreciate the reviewer's observation regarding the potential bias introduced by using two living individuals in experiments, as the behaviour of one individual can indeed influence the behaviour of the other. We shared this concern initially; however, the consistency of the data with our hypotheses suggests that this potential bias did not adversely affect the validity of our findings, rendering the concern largely illusory at least in the context of our study.

      We opted for the living-individual paradigm for the following reasons:

      There is a growing trend in ethological as well as animal cognition research towards more ecologically valid and biologically relevant settings, while simultaneously advancing the precision and quantification of the data collected. This is referred to as computational ethology.

      This approach advocates for assessing behaviour in environments that more closely resemble natural conditions, rather than relying solely on sterile and artificial experimental setups. The rationale is that such naturalistic arenas allow animals to exhibit a broader range of behaviours and interactions, providing a more accurate reflection of their cognitive and social abilities. The challenge, however, lies in navigating the inherent tradeoff between the strict control offered by standardized procedures and the ecological validity of more naturalistic interactions.

      By allowing two spiders to confront each other, we aimed to capture authentic behavioural responses while maintaining a degree of experimental standardization through the use of a controlled setup. Our approach ensures that the behaviours observed are not merely artifacts of an artificial environment but are representative of genuine social interactions. Also, to minimize potential biases arising from mutual behavioural influences, we employed a controlled and repeatable experimental environment. 

      We believe that the chosen approach provides a meaningful balance (in the above-mentioned trade-off) between ecological validity and experimental rigour. By combining a standardized environment with the naturalistic interaction of real spiders, we ensured that our findings are both scientifically robust and biologically relevant.

      However, this issue can be readily avoided because salticids are well known, for example, to be highly responsive to lures (e.g. dead prey glued in lifelike posture onto cork disks) and to computer animation. 

      While it is true that salticid spiders are responsive to lures and computer animations, we carefully considered the most appropriate and ecologically valid approach for our study. Our aim was to capture genuine behavioural patterns in a context that closely mimics the natural encounters these spiders experience.

      Additionally, creating comparable video stimuli of spiders presents its own set of challenges: Video recordings or computer animations may not fully capture the nuanced behaviours and subtle variations that occur during real-life interactions. There is also a risk that such stimuli could be perceived differently by the spiders, potentially introducing new biases or confounding factors.

      Scientific progress is not made by merely relying on previously established paradigms, especially when they may not be suitable for the specific context of a study. While alternative methods like lures or computer animations can be valuable in certain situations, our approach was deliberately chosen to best capture the naturalistic and interactive aspects of spider behaviour.

      These methods have already been successful and helpful for standardizing the different stimuli presented during many different experiments for many different salticid spiders, and they would be helpful for better understanding how Phidippus regius might recognize another individual on the basis of phenotypic variation. There are all sorts of ways in which a salticid might recognize another individual. Differences in face or body structure, or body size, or all of these, might have an important role in recognition, but we won't know what these are using the current methods alone. Also, I didn't see any details about whether body size was standardized in the current manuscript.

      As mentioned previously, the goal of our study was to demonstrate that identity recognition occurs in spiders. This alone is of significant importance, as it challenges existing assumptions about the cognitive capabilities of small-brained animals. We did not aim at providing a proximate explanation (mechanism) for identity recognition in spiders.

      The problem with what the reviewer suggested is this: As long as we do not have conclusive evidence that spiders recognize individual conspecifics, any attempt to design and manipulate stimuli would lack a solid foundation. Without understanding whether spiders have this capability, we cannot make informed decisions about which features or characteristics to manipulate in stimuli. In other words, this uncertainty means we lack a starting point for our assumptions, making it nearly impossible to create stimuli that would be useful or relevant in testing identity recognition.

      Additionally, it is nearly impossible to artificially generate a stimulus set that encompasses the natural variance in features that spiders use for visual individuation. There is no guarantee that artificial stimuli, such as lures or computer animations, would capture the relevant features that spiders use in natural interactions.

      In other words, the question how Phidippus regius recognizes another individual will be subject of further investigation. In this study, we focus on whether or not they individuate others.  

      For another perspective, my thoughts turn to a paper by Cross et al.

      Cross, F. R., Jackson, R. R., & Taylor, L. A. (2020). Influence of seeing a red face during the male-male encounters of mosquito-specialist spiders. Learning & Behavior, 48, 104-112. doi: 10.3758/s13420-020-00411-y

      These authors found that males of Evarcha culicivora, another salticid species that is known to have a red face, become less responsive to their own mirror images after having their faces painted with black eyeliner than if their faces remained red. In all instances, the spiders only saw their own mirror images and never another spider, and these results cannot be interpreted on the basis of habituation/dishabituation because the spiders were not responding differently when they simply saw their mirror image again. Instead, it was specifically the change to the spider's face which resulted in a change of behavior. The findings from this paper and from Tedore and Johnsen can help give us additional perspectives that the authors might like to consider. On the whole, I would like the authors to further consider the features that P. regius might use to discern and recognize another individual.

      We acknowledge that identifying the specific features used by P. regius for identity recognition is a valuable direction for future research. However, we must emphasise that without first establishing whether spiders are capable of individuating each other, it would be premature and challenging to determine the specific features they rely on for this process. A lack of response to certain features could either suggest that those features are not relevant or, more critically, that the spider does not recognize individual identities at all. Thus, our initial focus on demonstrating identity recognition is essential before delving into the specific cues or characteristics involved.

      While the call for addressing the proximate causation of identity recognition in jumping spiders is valid, we need to also reiterate the significance of our findings and why they stand on their own merit:

      Our study demonstrates for the first time that Phidippus regius can systematically individuate conspecifics, showing habituation within short intervals (10 minutes) and over longer intervals (1 hour). This behaviour is not due to general habituation or physical fatigue but is a result of cognitive habituation, as illustrated by the spiders' response to novel individuals introduced after repeated encounters with familiarized ones. 

      What are the implications of this? Our findings indicate that these spiders possess long-term memory and form representations that can be reactivated after an hour. While this is most-likely not fully consolidated memory formation (see our reply to Reviewer 1), it represents an encoded long-term memory. This implies that small-brained animals can remember, represent, and potentially build internal mental images, which are crucial for sophisticated cognitive processing. 

      Reviewer #3 (Public Review):

      Summary:

      Jumping spiders (family Salticidae) have extraordinarily good eyesight, but little is known about how sensitive these small animals might be to the identity of other individuals that they see. Here, experiments were carried out using Phidippus regius, a salticid spider from North America. There were three steps in the experiments; first, a spider could see another spider; then its view of the other spider was blocked; and then either the same or a different individual spider came into view. Whether it was the same or a different individual that came into view in the third step had a significant effect on how close together or far apart the spiders positioned themselves. It has been demonstrated before that salticids can discriminate between familiar and unfamiliar individuals while relying on chemical cues, but this new research on P. regius provides the first experimental evidence that a spider can discriminate by sight between familiar and unfamiliar individuals.

      Clark RJ, Jackson RR (1995) Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology and Evolution 7:185-190

      We appreciate the reviewer's comprehensive summary and acknowledgment of the significance of our findings.

      Strengths:

      This work is a useful step toward a fuller understanding of the perceptual and cognitive capacities of spiders and other animals with small nervous systems. By providing experimental evidence for a conclusion that a spider can, by sight, discriminate between familiar and unfamiliar individuals, this research will be an important milestone. We can anticipate a substantial influence on future research.

      We appreciate the reviewer’s recognition of the strengths and significance of our study. We are pleased that the reviewer considers our research an important milestone. Our findings indeed suggest that even animals with relatively simple nervous systems can perform complex cognitive tasks, which has substantial implications for the broader study of animal cognition.

      As pointed out by the reviewer, we also hope that our study will have a substantial influence on future research. By establishing a methodology and providing clear evidence of visual discrimination, we aim to encourage further investigations into the cognitive abilities of jumping spiders and other arthropods. Future research can build on our findings to explore the specific visual cues and mechanisms involved in individual recognition (as Reviewer 2 pointed out), as well as the ecological and evolutionary implications of these abilities.

      Weaknesses:

      (1) The conclusions should be stated more carefully.

      We agree that clarity in our conclusions is paramount. We will revise the manuscript to ensure that our conclusions are presented with precision and appropriately reflect the data. Specifically, we will emphasize the evidence supporting our findings of visual individual recognition and clarify the limitations and scope of our conclusions to avoid any potential overstatements.

      (2) It is not clearly the case that the experimental methods are based on 'habituation (learning to ignore; learning not to respond). Saying 'habituation' seems to imply that certain distances are instances of responding and other distances are instances of not responding but, as a reasonable alternative, we might call distance in all instances a response. However, whether all distances are responses or not is a distracting issue because being based on habituation is not a necessity.

      We appreciate the reviewer's feedback and understand the concern regarding the use of the term 'habituation.' We agree that all distances maintained by the spiders are active responses and reflect their behavioral decisions based on perception and recognition of the other individual. We recognize that all distances are responses and interpret these as the spiders’ “active decisions”, modulated by their recognition of the same or different individuals. 

      The terms 'habituation' and 'dishabituation' are used to label trial types for ease of discussion and to describe the expected behavioural modulation.

      (3) Besides data related to distances, other data might have been useful. For example, salticids are especially well known for the way they communicate using distinctive visual displays and, unlike distance, displaying is a discrete, unambiguous response.

      We appreciate the reviewer’s suggestion to incorporate data on visual displays, which are indeed well-known communication methods among salticids. We agree that visual displays are discrete and unambiguous responses that could provide additional insights into the spiders' recognition abilities.

      Our primary focus on distance measurements was driven by the need to quantify behaviour in a continuous and scalable manner, that is, how spiders modulate their proximity based on familiarity with other individuals.

      We acknowledge the potential value of including visual display measurments; however, in our study, we aimed to establish a foundational understanding of recognition behaviour through proximity measures first. Also, capturing diplays requires a different experimental paradigm, where the displays are clearly visible and analyzable. 

      (4) Methods more aligned with salticids having extraordinarily good eyesight would be useful. For example, with salticids, standardising and manipulating stimuli in experiments can be achieved by using mounts, video playback, and computer-generated animation.

      There is no doubt that salticids have excellent eyesight. However, our study focuses on higherlevel perceptual processes that require complex brain analysis, not just visual acuity. The goal was to investigate whether spiders can individuate and recognize conspecifics, which involves interpreting visual information and forming long-term representations.

      Clearly, methods like video playback and computer animations are useful in controlled settings, where the spider is mounted, but they pose challenges for our specific research question. At this stage of research, we lack precise knowledge of which visual features are critical for individual recognition in spiders, making it difficult to design effective artificial stimuli. 

      Our primary objective was to determine if spiders can individuate others. Before exploring the proximate mechanisms of how they individuate others, it was essential to establish that they have this capability. This foundational question needed to be addressed before delving into more detailed mechanistic studies.

      (5) An asocial-versus-social distinction is too imprecise, and it may have been emphasised too much. With P. regius, irrespective of whether we use the label asocial or social, the important question pertains to the frequency of encounters between the same individuals and the consequences of these encounters.

      Our intent was to convey that P. regius does not live in cohesive social groups but does engage in individual interactions that can have significant behavioral consequences. We will revise the manuscript to reduce the emphasis on the asocial-versus-social distinction. As discussed above, we also will change the term “asocial” to “non-social” in the manuscript.

      (6) Hypotheses related to not-so-strictly adaptive factors are discussed and these hypotheses are interesting, but these considerations are not necessarily incompatible with more strictly adaptive influences being relevant as well.

      We appreciate the reviewer's observation regarding the discussion of hypotheses related to notso-strictly adaptive factors. We agree that our considerations of these factors do not preclude the relevance of more strictly adaptive influences.

      We will revise the manuscript to explicitly discuss how our findings can be interpreted in the context of adaptive hypotheses. This will provide a more comprehensive understanding of the evolutionary significance of individual recognition in P. regius. Modifications were made in the Discussion section.

      In the following, we comment on issues not mentioned in the “public reviews” section.

      Reviewer #1 (Recommendations For The Authors):

      (1) I would suggest conducting experiments that actually test for recognition memory, as this seems to be a claim that the authors make. Following the ant studies by Dreier cited in this manuscript would be sufficient to test for memory. Given the relative simplicity of the measures being taken (location of spiders), this would seem like a very simple addition that would provide a much stronger and more readily interpreted dataset.

      As previously explained in our detailed responses (public reviews), we believe that the current design effectively addresses the questions at hand. Our approach, using a habituationdishabituation paradigm, provides robust evidence for recognition memory within the framework of early long-term memory.

      Additionally, we have explained why using the distance to the panel as a measure is not appropriate in this context. Specifically, using such a measure can misrepresent the actual interests of the spiders in each other.

      While we acknowledge the merits of the ant studies by Dreier, our current design allows for a detailed understanding of the spiders' recognition capabilities over short (10 min) and slightly longer intervals (up to one hour). This is sufficient to demonstrate the presence of recognition memory without the necessity of further experiments. The observed patterns of habituation and dishabituation responses in our study clearly indicate that the spiders can distinguish between familiar and novel individuals, which supports our claims.

      Given these points, we respectfully maintain that the current data and experimental design are adequate to support our findings and provide a comprehensive understanding of recognition memory in Phidippus regius.

      (2) The writing is rather impenetrable. The results explain the basic finding in terms of statistical variables rather than simply stating the results. A clear and straightforward statement such as 'the spiders showed reduced interest upon habituation trials, indicating xyz' (and then citing the stats) is preferable to the introduction of results as a statistical model. The statistical model is a means of assessing the results. It is not the result. Describe the data.

      We tried to improve that in the current version.

      (3) Showing more straightforward data such as distance from the joint barrier would make the paper much easier to understand.

      This paper has been on bioRxiv for some time and my guess is that it has ended up here because it is having trouble in review. Collecting new data that more directly test the question at hand, presenting the data in a more direct manner, and more critically evaluating your own claims will improve the paper.

      While it is true that the paper has been on bioRxiv for a while, this submission marks the first instance where it has undergone peer review. Prior to this, the manuscript was submitted to other journals but was not reviewed.

      We hope the explanations provided in the “public reviews” section, along with the revised manuscript, sufficiently clarify our study and its conclusions. We believe the current data robustly address the research questions, and as outlined in our detailed responses, we have critically evaluated our claims and presented the data clearly. Given these clarifications, we do not see the necessity for new experiments as the existing data adequately support our findings. We trust that these revisions and explanations will clarify any misunderstandings.

      I am totally sold that the spiders are paying attention to identity at some level. The key now is to understand what that actually means in terms of recognition (i.e. memory of individuals) not just habituation.

      We appreciate the reviewer’s emphasis on the distinction between habituation and memorybased individual recognition. As detailed in the preceding discussion, we have taken great care to clarify how our paradigm distinguishes simple habituation effects from true memory for individual identity. We trust that the preceding sections make clear how our findings go beyond simple habituation to establish genuine individual recognition.

      Reviewer #2 (Recommendations For The Authors):

      Aside from the comments in the public review, I have some additional comments that the authors may wish to consider.

      Numerous times in the manuscript, the authors mentioned that recognizing individuals requires recognition memory. This seems rather obvious, and I wonder if the authors could instead be more precise about what they mean by 'recognition memory'?

      Recognition memory refers to the cognitive ability to identify a previously encountered stimulus, an individual, or events as familiar. It involves both encoding and retrieval processes, allowing an organism to distinguish between novel and familiar stimuli. This form of memory is a fundamental component of cognitive functioning and is supported by neural mechanisms that, in the mammal brain, involve the hippocampus and other brain regions associated with memory processing. 

      In our study, we aimed to test whether Phidippus regius recognizes conspecifics, or, in other words, utilizes recognition memory to distinguish between familiar and unfamiliar conspecifics. With the habituation - dishabituation paradigm, we assessed the spiders' ability to recognize previously encountered individuals and demonstrate memory retention over short (10 min) and extended periods (1 hour).

      Encoding: In the initial trial, when a spider encounters an individual for the first time (Figure 1A, “Baseline” or “Dishabituation” for every following trial), it encodes the visual information related to that specific individual. This encoding process involves creating a memory trace of the individual's phenotypic characteristics.

      Storage: During the visual separation period, this encoded information is stored in the spider's memory system. The memory trace, though initially fragile, starts to stabilize over the separation period. Whether or not this leads to some form of consolidated memory remains unaddressed. This aspect was highlighted by the first reviewer, but our focus is on the early process rather than on late processes, such as consolidation. 

      Retrieval: In the subsequent trial, when the same individual is presented again, the spider retrieves the stored memory trace. If the spider recognizes the individual, its behaviour reflects habituation, indicating memory retrieval. Conversely, when a novel individual is introduced, the lack of stored memory trace triggers a different behavioural response, indicating dishabituation. This differential response demonstrates the spider's ability to distinguish between familiar and unfamiliar individuals. This differential response is also key to understanding the nature of habituation over the three sessions, as introducing novel spiders leads to a significant dishabituation response after the three sessions in Experiment 2.

      In Line 39, the authors state that they used "a naturalistic experimental procedure". I would like to know how this experiment is 'naturalistic'. The authors' use of an arena does not appear naturalistic, or something the spiders would encounter in the wild.

      We appreciate the reviewer's comment regarding our use of the term 'naturalistic'. We acknowledge that the experimental arena itself does not replicate the conditions found in the wild. Our approach aimed to incorporate elements of natural behaviour by allowing two spiders to freely move and interact within the controlled environment. This approach aligns with principles from computational ethology, which seeks to balance the trade-off between repeatability/standardization and observing free, naturalistic behaviour. By using this paradigm, we aimed to capture behaviours that closely resemble those exhibited in their natural habitat. This setup was chosen to balance the need for ecological validity with the requirements for standardized data collection. 

      Also, and this point has been raised above, by observing the spiders' natural interactions without restraining them or using artificial stimuli like computer animations, we aimed to capture behaviours that closely resemble their natural responses to conspecifics. In contrast, we would not have any clear expectations regarding responses to arbitrarily designed artificial stimuli. This method provides a more ecologically valid assessment of the spiders' recognition abilities.

      There are a few details wrong in Line 41. 'Salticidae' is a family name and shouldn't be italicized. Also, the sentence suggests that there is a spider called a 'jumping spider' in the family Salticidae, which is technically called Phidippus regius. To clarify, all spiders in the family Salticidae are known as jumping spiders, and one species of jumping spiders is called Phidippus regius.

      We will correct this in the manuscript to accurately reflect the classification and terminology. Thank you for pointing out these inaccuracies.

      A manuscript on individual recognition by a salticid should include citations to earlier papers that have already considered individual recognition by salticids. As well as the paper by Tedore and Johnsen (2013), the authors should be aware of the following papers.

      Clark, R. J., & Jackson, R. R. (1994). Portia labiata, a cannibalistic jumping spider, discriminates between its own and foreign egg sacs. International Journal of Comparative Psychology, 7, 3843.

      Clark, R. J., & Jackson, R. R. (1994). Self-recognition in a jumping spider: Portia labiata females discriminate between their own draglines and those of conspecifics. Ethology, Ecology & Evolution, 6, 371-375.

      Clark, R. J., & Jackson, R. R. (1995). Araneophagic jumping spiders discriminate between the draglines of familiar and unfamiliar conspecifics. Ethology, Ecology & Evolution, 7, 185-190.

      We appreciate the reviewer's suggestion to include citations to these earlier papers. We will add the recommended references to provide a comprehensive background.

      In Line 203, I would not consider "interaction with human caretakers and experimenters" to be a form of behavioral enrichment. This kind of interaction has the potential to be stressful for the spiders, rather than enriching. I suggest deleting that part of the sentence.

      We appreciate the reviewer's feedback and agree that interactions with human caretakers and experimenters might not always be enriching and could potentially be stressful for the spiders. We will remove that part of the sentence to better reflect the intended meaning.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript is useful and interesting, and I predict that it will be influential, but more attention should be given to stating the objective and conclusion accurately and clearly. As I understand it, the objective was to investigate a specific hypothesis: that Phidippus regius has a capacity to identify conspecific individuals as particular individuals (i.e., individual identification). Strong evidence supporting this hypothesis being true would be especially remarkable because I am unaware of any published work having shown evidence of a spider expressing this specific perceptual capacity.

      Thank you for recognizing the significance and potential influence of our manuscript. We agree that clearly stating the objective and conclusions is essential for conveying the importance of our findings. Our results provide robust evidence supporting the hypothesis that Phidippus regius can recognize and remember individual conspecifics. We will revise the manuscript to more clearly highlight the objective and our conclusions, emphasizing the novel evidence for individual identification in these spiders.

      Based on reading this manuscript and based on my understanding of the meaning of 'individual identification', it seems to me that the hypothesis that P. regius has a capacity for individual identification might or might not be true, and the experiments in this manuscript cannot tell us which is the case. 

      We respectfully disagree with the reviewer's assessment. Our experiments were carefully designed to test whether P. regius has the capacity for individual identification, and our results provide clear evidence supporting this hypothesis. The systematic differences in the spiders' behaviour when encountering familiar versus novel individuals indicate that they can recognize and remember specific conspecifics. We will revise the manuscript to ensure that the evidence and conclusions are stated more clearly to address any potential misunderstandings.

      Determining which is the case would have required research that made better use of the literature, and displayed more critical thinking. addressed credible alternative hypotheses and adopted experimental methods that focused more strictly on individual identification. 

      The distinction between whether P. regius has a capacity for individual identification is not ambiguous in our study. Our findings clearly demonstrate this capacity through systematic behavioural responses to familiar versus novel individuals. As pointed out above, the experimental procedure might be complex, but results are systematic despite this complexity. The experiments were designed to directly address the hypothesis of individual identification, and the data robustly support our conclusions. While considering alternative hypotheses is important, the results we present provide a coherent and compelling case for individual identification in P. regius. We will ensure our manuscript clearly articulates this narrative and the supporting evidence.

      At the same time, I also appreciate that asking for all of that at once would be asking for too much. As I see it, this manuscript tells us about research that moves us closer to a clear focus on the details and questions that will matter in the context of considering a hypothesis that is strictly about individual identification. More importantly, I think this research reveals a perceptual capacity that is remarkable even if it is not strictly a capacity for individual identification.

      We understand the desire for a more focused exploration of individual identification with paradigms more familiar to the reviewers and we acknowledge that further detailed studies could enhance our understanding of this capacity. However, our findings do indeed suggest that Phidippus regius exhibits a remarkable perceptual capacity for recognizing and remembering individual conspecifics. The systematic behavioural responses observed in our experiments strongly indicate that these spiders possess the ability for individual recognition. While our study may not have explored every potential detail (e.g. which features are most crucial for the memory matching processes), the evidence we present robustly supports the conclusion of individual identification.

      We acknowledge that it is indeed valuable to follow established paradigms and build upon the frameworks that have been used successfully in similar species and studies. These paradigms provide a solid foundation for scientific inquiry and allow for comparability across different research efforts. However, it is equally important to acknowledge and explore alternative approaches. Scientific progress is driven not only by replication but also by innovation. By employing new paradigms, researchers can uncover novel insights and push the boundaries of current understanding. The paradigm we used in our study, while different from those traditionally applied to similar research, is not an invention but a well-established method in various domains. It represents an innovative application in the context of our specific research questions, offering a fresh perspective and contributing to the advancement of the field.

      As I understand it, 'individual identification' means identifying another individual as being a particular individual instead of a member of a larger set (or 'class') of individuals. An 'individual' is a set containing a single individual. Interesting examples of identifying members of larger sets include discriminating between familiar and unfamiliar individuals. In the context of the specific experiments in this manuscript, familiar-unfamiliar discrimination means discriminating between recently-seen and not-so-recently-seen individuals. My impression is that the experiments in this manuscript have given us a basis for concluding that P. regius has a capacity for familiarunfamiliar (recently seen versus not so recently seen) discrimination. If this is the case, then I think this is the conclusion that should be emphasised. This would be an important conclusion.

      I appreciate that, depending on how we use the words, familiar-unfamiliar discrimination might be construed as being 'individual identification'. An individual is identified as 'the individual recently seen'. As a casual way of speaking, it can be reasonable to call this 'individual identification'. The difficulty comes from the way calling this 'individual identification' can suggest something more than has been demonstrated. To navigate through this difficulty, we need an expression to use for a capacity that goes beyond familiar-unfamiliar discrimination. In the context of this manuscript about P. regius, we need expressions that will make it easy to consider two things. One of these things is a capacity for familiar-unfamiliar discrimination. The other is the capacity to identify another individual as being a particular individual.

      We appreciate the reviewer's insightful comments on the distinction between familiar-unfamiliar discrimination and individual identity recognition. Our study indeed focuses on demonstrating that Phidippus regius can recognize and remember individual conspecifics, providing evidence for individual identity recognition.

      Two specific behavioural hallmarks that speak against familiarity recognition:

      First, the significant dishabituation response to novel individuals introduced after multiple sessions underscores the specificity of the recognition. This shows that the spiders' habituation is not general but specific to familiar individuals. 

      Second, the pattern of habituation over the sessions provides further evidence: We observed the strongest systematic modulation in Session 1, a reduced modulation in Session 2, and a further diminished effect in Session 3. If the spiders were only responding based on familiarity, we would expect a more drastic decrease, resulting in a washed-out non-effect by Session 2. However, the continued, though diminishing, differentiation between habituation and dishabituation trials across sessions indicates that the spiders are not merely responding to a general sense of familiarity but are engaging in individual recognition. In other words, the spiders' ability to distinguish between familiar and novel individuals even after repeated exposures suggests that they are not just recognizing a familiar status but are identifying specific individuals.

      Things people do might help clarify what this means. People have an extraordinary capacity for identifying other individuals as particular individuals. Often this is based on giving each other names. Imagine we are letting somebody see photographs and asking them to identify who they see. The answer might be, 'somebody familiar' or 'somebody I saw recently' (familiar-unfamiliar discrimination); or the question might be answered by naming a particular individual (individual identification).

      We appreciate the reviewer's efforts to clarify the distinction between familiar-unfamiliar discrimination and individual recognition using human examples. However, we believe this comparison might not fully capture the complexity of individual recognition in non-human animals. 

      Familiarity recognition refers to recognizing someone as having been seen or encountered before without necessarily distinguishing them from others in the same category. On the other hand, identity recognition involves recognizing a specific individual based on unique characteristics (or features). In humans, this often involves naming, but more critically, like in most animals, it involves recognizing visual, auditory, chemical or other sensory cues. In animals, including spiders, individual recognition does not involve and let alone rely on naming but on the ability to distinguish between individuals based on sensory cues and learnt associations. This is a valid and well-documented form of individual recognition across many species.

      Individual recognition does not require naming or the assignment of a referential label. Animals can distinguish between specific individuals based on previously perceived and stored features and characteristics. Naming is the exception rather than the rule in the animal kingdom. Only a few species, such as humans and maybe certain cetaceans, use naming for identity recognition. This is an evolutionary rarity and not the standard mechanism for individual recognition, which primarily relies on sensory cues and learnt associations. Furthermore, the mechanism of recognition in both humans and animals involves a complex process of matching incoming sensory and perceptual information with stored memory representations. Naming is merely a tool for communication, allowing us to convey which individual we are referring to. It is not the mechanism by which recognition occurs. The core of individual recognition is this matching process, where sensory cues (visual, auditory, chemical, etc.) are compared to memory traces of previously encountered individuals. Therefore, the suggestion that individual identification necessitates naming misrepresents the actual cognitive processes involved. 

      We can think of individual identification being based on more fine-grained discrimination (with this, set size = one), with familiar-unfamiliar discrimination being more coarse-grained discrimination (with this, set size can be more than one). Restricting the expression 'individual identification' to instances of having the capacity to identify another individual as being a particular individual (set size = one) is better aligned with normal usage of this expression.

      Absolutely, the distinction between fine-grained and coarse-grained discrimination aligns with the concept of different category levels, such as basic and subordinate levels, put forward by Eleanor Rosch (e.g. Rosch, 1973). In the context of individual recognition, fine-grained discrimination (where set size = one) refers to the ability to identify a specific individual based on unique characteristics. This is referred to as subordinate level categorization. Coarse-grained discrimination (where set size can be more than one) refers to recognizing someone as familiar without distinguishing them from others in the same category, more similar to basic level categorization. 

      Rosch, E.H. (1973). "Natural categories". Cognitive Psychology. 4 (3): 328–50.doi:10.1016/0010-0285(73)90017-0

      There is a strong emphasis on an asocial-social distinction in this manuscript. It seems to me that this needs to be focused more clearly on the specific factors that would make a capacity for individual identification beneficial. In the context of this manuscript, the term 'social' may suggest too much. It seems to me that the issue that matters the most is whether individuals live in situations where important encounters occur frequently between the same individuals. Irrespective of whether other notions of the meaning of 'social' also apply, there are salticids that live in aggregated situations where they frequently have important encounters with each other. This is the case with Phidippus regius in the field in Florida, but I realize that there may not be much published information about the natural history of this salticid. Even so, there are salticids to which the word 'social' has been applied in published literature.

      We appreciate the reviewer's comments on the asocial-social distinction and we agree that this terminology might need refinement. Our intent was not to categorize Phidippus regius rigidly but to explore the contextual factors influencing the benefits of individual identification. The critical factor in our study is indeed the frequency and importance of encounters between individuals, rather than a broader social structure. We will revise the manuscript to reflect this more nuanced perspective, focusing on the ecological validity of our experimental design and the adaptive significance of individual recognition in environments where repeated encounters can occur.

    1. eLife Assessment

      This important study uses data on over 56 million articles to examine the dynamics of interdisciplinarity and international collaborations in research journals. The data analytics used to quantify disciplinary and national diversity are convincing, and support the claims that journals have become more diverse in both aspects.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore how interdisciplinarity and internationalization-two increasingly prominent characteristics of scientific publishing-have evolved over the past century. By constructing entropy-based indices from a large-scale bibliometric dataset (OpenAlex), they examine both long-term trends and recent dynamics in these two dimensions across a selection of leading disciplinary and multidisciplinary journals. Their goal is to identify field-specific patterns and structural shifts that can inform our understanding of how science has become more globally collaborative and intellectually integrated.

      Strengths and Weaknesses:

      The paper's primary strength lies in its comprehensive temporal scope and use of a rich, openly available dataset covering over 56 million articles. The interdisciplinary and internationalization indices are well-founded and allow meaningful comparisons across fields and time. Moreover, the distinction between disciplinary and multidisciplinary journals adds valuable nuance. However, some methodological choices, such as the use of a 5-year sliding window to compute trend values, are insufficiently justified and under-explained. The paper also does not fully address disparities in data coverage across disciplines and time, which may affect the reliability of historical comparisons. Finally, minor issues in grammar and clarity reduce the overall polish of the manuscript.

      Evaluation of Findings:

      Overall, the authors have largely succeeded in achieving their stated aims. The findings-such as the sharp rise in internationalization in fields like Physics, and the divergence in interdisciplinarity trends across disciplines-are clearly presented and generally well-supported by the data. The authors effectively demonstrate that scientific journals have not followed a uniform trajectory in terms of structural evolution. However, greater clarity in trend estimation methods and better acknowledgment of dataset limitations would help to further substantiate the conclusions and enhance their generalizability.

      Impact and Relevance:

      This study makes a timely and meaningful contribution to the fields of scientometrics, sociology of science, and science policy. Its combination of scale, historical depth, and field-level comparison offers a useful framework for understanding changes in scientific publishing practices. The entropy-based indicators are simple yet flexible, and the use of open bibliometric data enhances reproducibility and accessibility for future research. Policymakers, journal editors, and researchers interested in publication dynamics will likely find this work informative, and its methods could be applied or extended to other structural dimensions of scholarly communication.

    3. Reviewer #2 (Public review):

      Summary:

      This paper uses large-scale publication data to examine the dynamics of interdisciplinarity and international collaborations in research journals. The main finding is that interdisciplinarity and internationalism have been increasing over the past decades, especially in prestigious general science journals.

      Strengths:

      The paper uses a state-of-the-art large-scale publication database to examine the dynamics of interdisciplinarity and internationalism. The analyses span over a century and in major scientific fields in natural sciences, engineering, and social sciences. The study is well designed and has provided a range of robustness tests to enhance the main findings. The writing is clear and well organized.

      Weaknesses:

      While the research provides interesting perspectives for the reader to learn about the trends of journal preferences, I have a few points for the authors to consider that might help strengthen their work.

      The first thing that comes to mind is the epistemic mechanism of the study. Why should there be a joint discussion combining internationalism and interdisciplinarity? While internationalism is the tendency to form multinational research teams to work on research projects, interdisciplinarity refers to the scope and focus of papers that draw inspiration from multiple fields. These concepts may both fall into the realm of diversity, but it remains unclear if there is any conceptual interplay that underlies the dynamics of their increase in research journals.

      It is also unclear why internationalization is increasing. Although the authors have provided a few prominent examples in physics, such as CERN and LIGO, which are complex and expensive experimental facilities that demand collective efforts and investments from the global scientific community, whether some similar concerns or factors drive the growth of internationalism in other fields remains unknown. I can imagine that these concerns do not always apply in many fields, and the authors need to come up with some case studies in diverse fields with some sociological theory to support their empirical findings.

      The authors use Shannon entropy as a measure of diversity for both internationalism and interdisciplinarity. However, entropy may fail to account for the uneven correlations between fields, and the range of value chances when the number of categories changes. The science of science and scientometrics community has proposed a range of diversity indicators, such as the Rao-Stirling index and its derivatives. One obvious advantage of the RS index is that it explicitly accounts for the heterogeneous connections between fields, and the value ranges from 0 to 1. Using more state-of-the-art metrics to quantify interdisciplinarity may help strengthen the data analytics.

    1. eLife Assessment

      This important study combines imaginative and innovative experiments with a finite element modelling to demonstrate the relevance of poroelasticity in the mechanical properties of cells across physiologically relevant time and length scales. The authors present convincing evidence that cytosolic flows and pressure gradients can persist in cells with permeable membranes, generating spatially segregated influx and outflux zones. These findings are of interest to the cell biology and biophysics communities.

    2. Reviewer #1 (Public review):

      Summary:

      This work investigated whether cytoplasmic poroelastic properties play an important role in cellular mechanical response over length scales and time scales relevant to cell physiology. Overall, the manuscript concludes that intracellular cytosolic flows and pressure gradients are important for cell physiology and that they act of time- and length-scales relevant to mechanotransduction and cell migration.

      Strengths:

      Their approach integrates both computational and experimental methods. The AFM deformation experiments combined with measuring z-position of beads is a challenging yet compelling method to determine poroelastic contributions to mechanical realization.

      The work is quite interesting and will be of high value to the field of cell mechanics and mechanotransduction.

      Weaknesses:

      The weaknesses I noted earlier were adequately addressed in the revised version.

    3. Reviewer #2 (Public review):

      Summary:

      Malboubi et al. present an experimental framework to investigate the rheological properties of the cell cytoplasm. Their findings support a model where the cytoplasm behaves as a poroelastic material governed by Darcy's law. They demonstrate that this poroelastic behavior delays the equilibration of hydrostatic pressure gradients within the cytoplasm over timescales of 1 to 10 seconds following a perturbation, likely due to fluid-solid friction within the cytoplasmic matrix. Furthermore, under sustained perturbations such as depressurization, they reveal that pressure gradients can persist for minutes, which they propose might potentially influence physiological processes like mechanotransduction or cell migration typically happening on these timescales.

      Strengths:

      This article holds significant value within the ongoing efforts of the cell biology and biophysics communities to quantitatively characterize the mechanical properties of cells. The experiments are innovative and thoughtfully contextualized with quantitative estimates and a finite element model that supports the authors' hypotheses.

      Comments & Questions:

      The authors have successfully addressed the questions and comments raised in my previous review, significantly improving the manuscript's depth. Regarding my last question on the predicted saturation of the time lag, the authors propose the interesting hypothesis that the cell cortex becomes dominant at distances beyond 30 microns and plan to test this hypothesis at a later stage.

    4. Reviewer #3 (Public review):

      Summary:

      In this delightful study, the authors use local indentation of the cell surface combined with out-of-focus microscopy to measure the rates of pressure spread in the cell and to argue that the results can be explained with the poroelastic model. Osmotic shock that decreases cytoskeletal mesh size supports this notion. Experiments with water injection and water suction further support it, and also, together with a mechanical model and elegant measurements of decreasing fluorescence in the cell 'flashed' by external flow, demonstrate that the membrane is permeable, and that steady flow and pressure gradient can exist in a cell with water source/sink in different locations. Use of blebs as indicators of the internal pressure further supports the notion of differential cytoplasmic pressure.

      Strengths:

      The study is very imaginative, interesting, novel and important.

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss potential effect of significant heterogeneity of the cell.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

      Comments on revisions: I am satisfied with the revisions

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Some details are not described for experimental procedures. For example, what were the pharmacological drugs dissolved in, and what vehicle control was used in experiments? How long were pharmacological drugs added to cells?

      We apologise for the oversight. These details have now been added to the methods section of the manuscript as well as to the relevant figure legends.

      Briefly, latrunculin was used at a final concentration of 250 nM and Y27632 at a final concentration of 50 μM. Both drugs were dissolved in DMSO. The vehicle controls were effected with the highest final concentration of DMSO of the two drugs.

      The details of the drug treatments and their duration was added to the methods and to figures 6, S10, and S12.

      (2) Details are missing from the Methods section and Figure captions about the number of biological and technical replicates performed for experiments. Figure 1C states the data are from 12 beads on 7 cells. Are those same 12 beads used in Figure 2C? If so, that information is missing from the Figure 2C caption. Similarly, this information should be provided in every figure caption so the reader can assess the rigor of the experiments. Furthermore, how heterogenous would the bead displacements be across different cells? The low number of beads and cells assessed makes this information difficult to determine.

      We apologise for the oversight. We have now added this data to the relevant figure panels.

      To gain a further understanding of the heterogeneity of bead displacements across cells, we have replotted the relevant graphs using different colours to indicate different cells. This reveals that different cells appear to behave similarly and that the behaviour appears controlled by distance to the indentation or the pipette tip rather than cell identity.

      We agree with the reviewer that the number of cells examined is low. This is due to the challenging nature of the experiments that signifies that many attempts are necessary to obtain a successful measurement.

      The experiments in Fig 1C are a verification of a behaviour documented in a previous publication [1]. Here, we just confirm the same behaviour and therefore we decided that only a small number of cells was needed.

      The experiments in Fig 2C (that allow for a direct estimation of the cytoplasm’s hydraulic permeability) require formation of a tight seal between the glass micropipette and the cell, something known as a gigaseal in electrophysiology. The success rate of this first step is 10-30% of attempts for an experienced experimenter. The second step is forming a whole cell configuration, in which a hydraulic link is formed between the cell and the micropipette. This step has a success rate of ~ 50%. Whole cell links are very sensitive to any disturbance. After reaching the whole cell configuration, we applied relatively high pressures that occasionally resulted in loss of link between the cell and the micropipette. In summary, for the 12 successful measurements, hundreds of unsuccessful attempts were carried out.

      (3) The full equation for displacement vs. time for a poroelastic material is not provided. Scaling laws are shown, but the full equation derived from the stress response of an elastic solid and viscous fluid is not shown or described.

      We thank the reviewer for this comment. Based on our experiments, we found that the cytoplasm behaves as a poroelastic material. However, to understand the displacements of the cell surface in response to localised indentation, we show that we also need to take the tension of the submembranous cortex into account. In summary, the interplay between cell surface tension generated by the cortex and the poroelastic cytoplasm controls the cell behaviour. To our knowledge, no simple analytical solutions to this type problem exist.

      In Fig 1, we show that the response of the cell to local indentation is biphasic with a short time-scale displacement followed by a longer time-scale one. In Figs 2 and 3, we directly characterise the kinetics of cell surface displacement in response to microinjection of fluid. These kinetics are consistent with the long time-scale displacement but not the short time-scale one. Scaling considerations led us to propose that tension in the cortex may play a role in mediating the short time-scale displacement. To verify this hypothesis, we have now added new data showing that the length-scale of an indentation created by an AFM probe depends on tension in the cortex (Fig S5).  

      In a previous publication [2], we derived the temporal dynamics of cell surface displacement for a homogenous poroelastic material in response to a change in osmolarity. In the current manuscript, the composite nature of the cell (membrane, cortex, cytoplasm) needs to be taken into account as well as a realistic cell shape. Therefore, we did not attempt to provide an analytical solution for the displacement of the cell surface versus time in the current work. Instead, we turned to finite element modelling to show that our observations are qualitatively consistent with a cell that comprises a tensed submembranous actin cortex and a poroelastic cytoplasm (Fig 4). We have now added text to make this clearer for the reader.

      Reviewer #2 (Public review):

      Comments & Questions:

      The authors state, "Next, we sought to quantitatively understand how the global cellular response to local indentation might arise from cellular poroelasticity." However, the evidence presented in the following paragraph appears more qualitative than strictly quantitative. For instance, the length scale estimate of ~7 μm is only qualitatively consistent with the observed ~10 μm, and the timescale 𝜏𝑧 ≈ 500 ms is similarly described as "qualitatively consistent" with experimental observations. Strengthening this point would benefit from more direct evidence linking the short timescale to cell surface tension. Have you tried perturbing surface tension and examining its impact on this short-timescale relaxation by modulating acto-myosin contractility with Y-27632, depolymerizing actin with Latrunculin, or applying hypo/hyperosmotic shocks?

      Upon rereading our manuscript, we agree with the reviewer that some of our statements are too strong. We have now moderated these and clarified the goal of that section of the text.

      The reviewer asks if we have examined the effect of various perturbations on the short time-scale displacements. In our experimental conditions, we cannot precisely measure the time-scale of the fast relaxation because its duration is comparable to the frame rate of our image acquisition. However, we examined the amplitude of the displacement of the first phase in response to sucrose treatment and we have carried out new experiments in which we treat cells with 250nM Latrunculin to partially depolymerise cellular F-actin. Neither of these treatments had an impact on the amplitude of vertical displacements (Fig. S3).

      The absence of change in response to Latrunculin may be because the treatment decreases both the elasticity of the cytoplasm  and the cortical tension . As the length-scale  of the deformation of the surface scales as , the two effects of latrunculin treatment may therefore compensate one another and result in only small changes in . We have now added this data to supplementary information and comment on this in the text.   

      The reviewer’s comment also made us want to determine how cortical tension affects the length-scale of the cell surface deformation created by localised microindentation. To isolate the role of the cortex from that of cell shape, we decided to examine rounded mitotic cells. In our experiments, we indented a mitotic cell expressing a membrane targeted GFP with a sharp AFM tip (Fig. S5).

      In our experiments, we adjusted force to generate a 2μm depth indentation and we imaged the cell profile with confocal microscopy before and during indentation. Segmentation of this data allowed us to determine the cell surface displacement resulting from indentation and measure a length scale of deformation. In control conditions, the length scale created by deformation is on the order of 1.2μm. When we inhibited myosin contractility with blebbistatin, the length-scale of deformation decreased significantly to 0.8 μm, as expected if we decrease the surface tension γ without affecting the cytoplasmic elasticity. We have now added this data to our manuscript.

      The authors demonstrate that the second relaxation timescale increases (Figure 1, Panel D) following a hyperosmotic shock, consistent with cytoplasmic matrix shrinkage, increased friction, and consequently a longer relaxation timescale. While this result aligns with expectations, is a seven-fold increase in the relaxation timescale realistic based on quantitative estimates given the extent of volume loss?

      We thank the reviewer for this interesting question. Upon re-examining our data, we realised that the numerical values in the text related to the average rather than the median of our measurements. The median of the poroelastic time constant increases from ~0.4s in control conditions to 1.4s in sucrose, representing approximately a 3.5 fold increase.

      Previous work showed that HeLa cell volume decreases by ~40% in response to hyperosmotic shock [3]. The fluid volume fraction in cells is ~65-75%. If we assume that the water is contained in N pores of volume , we can express the cell volume as with the volume of the solid fraction. We can rewrite .

      With ∅ = 0.42  -0.6.  As  does not change in response to osmotic shock, we can rewrite the volume change to obtain the change in pore size .

      The poroelastic diffusion constant scales as and the poroelastic timescale scales as . Therefore, the measured change in volume leads to a predicted increase in poroelastic diffusion time of 1.7-1.9 fold, smaller than observed in our experiments. This suggests that some intuition can be gained in a straightforward manner assuming that the cytoplasm is a homogenous porous material.

      However, the reality is more complex and the hydraulic pore size is distinct from the entanglement length of the cytoskeleton mesh, as we discussed in a previous publication [4]. When the fluid fraction becomes sufficiently small, macromolecular crowding will impact diffusion further and non-linearities will arise. We have now added some of these considerations to the discussion.

      If the authors' hypothesis is correct, an essential physiological parameter for the cytoplasm could be the permeability k and how it is modulated by perturbations, such as volume loss or gain. Have you explored whether the data supports the expected square dependency of permeability on hydraulic pore size, as predicted by simple homogeneity assumptions?

      We thank the reviewer for this comment. As discussed above, we have explored such considerations in a previous publication (see discussion in [4]). Briefly, we find that the entanglement length of the F-actin cytoskeleton does play a role in controlling the hydraulic pore size but is distinct from it. Membrane bounded organelles could also contribute to setting the pore size. In our previous publication, we derived a scaling relationship that indicates that four different length-scales contribute to setting cellular rheology: the average filament bundle length, the size distribution of particles in the cytosol, the entanglement length of the cytoskeleton, and the hydraulic pore size. Many of these length-scales can be dynamically controlled by the cell, which gives rise to complex rheology. We have now added these considerations to our discussion.

      Additionally, do you think that the observed decrease in k in mitotic cells compared to interphase cells is significant? I would have expected the opposite naively as mitotic cells tend to swell by 10-20 percent due to the mitotic overshoot at mitotic entry (see Son Journal of Cell Biology 2015 or Zlotek Journal of Cell Biology 2015).

      We thank the reviewer for this interesting question. Based on the same scaling arguments as above, we would expect that a 10-20% increase in cell volume would give rise to 10-20% increase in diffusion constant. However, we also note that metaphase leads to a dramatic reorganisation of the cell interior and in particular membrane-bounded organelles. In summary, we do not know why such a decrease could take place. We now highlight this as an interesting question for further research.

      Based on your results, can you estimate the pore size of the poroelastic cytoplasmic matrix? Is this estimate realistic? I wonder whether this pore size might define a threshold above which the diffusion of freely diffusing species is significantly reduced. Is your estimate consistent with nanobead diffusion experiments reported in the literature? Do you have any insights into the polymer structures that define this pore size? For example, have you investigated whether depolymerizing actin or other cytoskeletal components significantly alters the relaxation timescale?

      We thank the reviewer for this comment. We cannot directly estimate the hydraulic pore size from the measurements performed in the manuscript. Indeed, while we understand the general scaling laws, the prefactors of such relationships are unknown.

      We carried out experiments aiming at estimating the hydraulic pore size in previous publications [3,4] and others have shown spatial heterogeneity of the cytoplasmic pore size [5]. In our previous experiments, we examined the diffusion of PEGylated quantum dots (14nm in hydrodynamic radius). In isosmotic conditions, these diffused freely through the cell but when the cell volume was decreased by a hyperosmotic shock, they no longer moved [3,4]. This gave an estimate of the pore radius of ~15nm.

      Previous work has suggested that F-actin plays a role in dictating this pore size but microtubules and intermediate filaments do not [4].

      There are no quantifications in Figure 6, nor is there a direct comparison with the model. Based on your model, would you expect the velocity of bleb growth to vary depending on the distance of the bleb from the pipette due to the local depressurization? Specifically, do blebs closer to the pipette grow more slowly?

      We apologise for the oversight. The quantifications are presented in Fig S10 and Fig S12. We have now modified the figure legends accordingly.

      Blebs are very heterogenous in size and growth velocity within a cell and across cells in the population in normal conditions [6]. Other work has shown that bleb size is controlled by a competition between pressure driving growth and actin polymerisation arresting it[7]. Therefore, we did not attempt to determine the impact of depressurisation on bleb growth velocity or size.

      In experiments in which we suddenly increased pressure in blebbing cells, we did notice a change in the rate of growth of blebs that occurred after we increased pressure (Author response image 1). However, the experiments are technically challenging and we decided not to perform more.

      Author response image 1.

      A. A hydraulic link is established between a blebbing cell and a pipette. At time t>0, a step increase in pressure is applied. B. Kymograph of bleb growth in a control cell (top) an in a cell subjected to a pressure increase at t=0s (bottom). Top: In control blebs, the rate of growth is slow and approximately constant over time. The black arrow shows the start of blebbing. Bottom: The black arrow shows the start of blebbing. The dashed line shows the timing of pressure application and the red arrow shows the increase in growth rate of the bleb when the pressure increase reaches the bleb. This occurs with a delay δt.

      I find it interesting that during depressurization of the interphase cells, there is no observed volume change, whereas in pressurization of metaphase cells, there is a volume increase. I assume this might be a matter of timescale, as the microinjection experiments occur on short timescales, not allowing sufficient time for water to escape the cell. Do you observe the radius of the metaphase cells decreasing later on? This relaxation could potentially be used to characterize the permeability of the cell surface.

      We thank the reviewer for this comment.

      First, we would like to clarify that both metaphase and interphase cells increase their volume in response to microinjection. The effect is easier to quantify in metaphase cells because we assume spherical symmetry and just monitor the evolution of the radius (Fig 3). However, the displacement of the beads in interphase cells (Fig 2) clearly shows that the cell volume increases in response to microinjection. For both interphase and metaphase cells, when the injection is prolonged, the membrane eventually detaches from the cortex and large blebs form until cell lysis. In contrast to the reviewer’s intuition, we never observe a relaxation in cell volume, probably because we inject fluid faster than the cell can compensate volume change through regulatory mechanisms involving ion channels.

      When we depressurise metaphase cells, we do not observe any change in volume (Fig S10). This contrasts with the increase that we observe upon pressurisation. The main difference between these two experiments is the pressure differential. During depressurisation experiments, this is the hydraulic pressure within the cell ~500Pa (Fig 6A); whereas during pressurisation experiments, this is the pressure in the micropipette, ranging from 1.4-10 kPa (Fig 3). We note in particular that, when we used the lowest pressures in our experiments, the increase in volume was very slow (see Fig 3C). Therefore, we agree with the reviewer that it is likely the magnitude of the pressure differential that explains these differences.

      I am curious about the saturation of the time lag at 30 microns from the pipette in Figure 4, Panel E for the model's prediction. A saturation which is not clearly observed in the experimental data. Could you comment on the origin of this saturation and the observed discrepancy with the experiments (Figure E panel 2)? Naively, I would have expected the time lag to scale quadratically with the distance from the pipette, as predicted by a poroelastic model and the diffusion of displacement. It seems weird to me that the beads start to move together at some distance from the pipette or else I would expect that they just stop moving. What model parameters influence this saturation? Does membrane permeability contribute to this saturation?

      We thank the reviewer for pointing this out. In our opinion, the saturation occurring at 30 microns arises from the geometry of the model. At the largest distance away from the micropipette, the cortex becomes dominant in the mechanical response of the cell because it represents an increasing proportion of the cellular material.

      To test this hypothesis, we will rerun our finite element models with a range of cell sizes. This will be added to the manuscript at a later date.

      Reviewer #3 (Public review):

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss the potential effect of significant heterogeneity of the cell.

      We thank the reviewer for these comments and we agree with their general premise.

      Some observations could qualitatively be explained in other ways. For example, if we considered the cell as a viscoelastic material, we could define a time constant with η the viscosity and E the elasticity of the material. The increase in relaxation time with sucrose treatment could then be explained by an increase in viscosity. However, work by others has  previously shown that, in the exact same conditions as our experiment, viscoelasticity cannot account for the observations[1]. In its discussion, this study proposed poroelasticity as an alternative mechanism but did not investigate that possibility. This was consistent with our work that showed that the cytoplasm behaves as a poroelastic material and not as a viscoelastic material [4]. Therefore, we decided not to consider viscoelasticity as possibility. We now explain this reasoning better and have added a sentence about a potential role for mechanotransductory processes in the discussion.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

      We agree with the reviewer. In our previous studies, we already examined what biological structures affect the poroelastic properties of cells [2,4]. Therefore, the most interesting aspect to examine in our current work would be perturbations to the phenomenon described in Fig 6G and, in particular, to investigate what volume regulation mechanisms enable sustained intracellular pressure gradients. However, these experiments are particularly challenging and with very low throughput. Therefore, we feel that these are out of the scope of the present report and we mention these as promising future directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please add more information to Materials and methods and figure captions to more clearly share how many different cells and trials the data are coming from.

      This has been done.

      Please add the full equation for displacement vs. time for the poroelastic model and describe appropriately.

      This cannot be done but we explain why.

      Overall, the clarity of the writing in the manuscript could be improved.

      This has been done.

      Please increase text size in some of the figures.

      This has been done.

      Reviewer #2 (Recommendations for the authors):<br /> Figure 1 would benefit from some revisions for clarity. In Panel D, for the control experiment with 7 cells, why are only 3 data points shown?

      This was due to the use of excel for generating the box plot. Some data points overlap. We now have used a different software.

      In Panel E, there is no legend explaining the red dots in the whisker plots.

      This has now been added.

      Additionally, the inset in Panel D lacks a legend, and it is unclear how k was computed.

      This inset panel has been removed.

      Moreover, I find Figure 1, Panel C somewhat pixelated, which makes it challenging to interpret. As I am colorblind, I need to zoom in significantly to distinguish the colors, and the current resolution makes this difficult. Improving the image resolution would be helpful.

      Apologies for this. We have now verified the quality of images on our submission.  

      I am unsure about the method used to compute the relaxation timescale in Figure S2. If an exponential relaxation is assumed, I would expect a function of the form:

      which implies that for t=t1+tau_p, the result should be d1+0.6*Delta d which does not correspond to the formula given. Have you tried fitting the data with an exponential function or using the model to extract tau_p without assuming a specific functional form?

      We thank the reviewer for pointing this out. We have now added further explanation of the fitting to the figure legend.

      References:

      (1) Rosenbluth, M. J., Crow, A., Shaevitz, J. W. & Fletcher, D. A. Slow stress propagation in adherent cells. Biophys J 95, 6052-6059 (2008). https://doi.org/10.1529/biophysj.108.139139

      (2) Esteki, M. H. et al. Poroelastic osmoregulation of living cell volume. iScience 24, 103482 (2021). https://doi.org/10.1016/j.isci.2021.103482

      (3) Charras, G. T., Mitchison, T. J. & Mahadevan, L. Animal cell hydraulics. J Cell Sci 122, 3233-3241 (2009). https://doi.org/10.1242/jcs.049262

      (4) Moeendarbary, E. et al. The cytoplasm of living cells behaves as a poroelastic material. Nat Mater 12, 253-261 (2013). https://doi.org/10.1038/nmat3517

      (5) Luby-Phelps, K., Castle, P. E., Taylor, D. L. & Lanni, F. Hindered diffusion of inert tracer particles in the cytoplasm of mouse 3T3 cells. Proc Natl Acad Sci U S A 84, 4910-4913 (1987). https://doi.org/10.1073/pnas.84.14.4910

      (6) Charras, G. T., Coughlin, M., Mitchison, T. J. & Mahadevan, L. Life and times of a cellular bleb. Biophys J 94, 1836-1853 (2008). https://doi.org/10.1529/biophysj.107.113605

      (7) Tinevez, J. Y. et al. Role of cortical tension in bleb growth. Proc Natl Acad Sci U S A 106, 18581-18586 (2009). https://doi.org/10.1073/pnas.0903353106

    1. eLife Assessment

      In flies defective for axonal transport of mitochondria, the authors report the upregulation of one subunit, the beta subunit, of the heterotrimeric eIF2 complex via mass spectroscopy proteomics. Neuronal overexpression of eIF2β phenocopied aspects of neuronal dysfunction observed when axonal transport of mitochondria was compromised. Conversely, lowering eIF2β expression suppressed aspects of neuronal dysfunction. While these are intriguing and useful observations, technical weaknesses limit the interpretation. On balance, the evidence supporting the current claims is suggestive but incomplete, especially concerning the characterization of the eIF2 heterotrimer and the data regarding translational regulation.

    2. Reviewer #1 (Public review):

      The study presents significant findings on the role of mitochondrial depletion in axons and its impact on neuronal proteostasis. It effectively demonstrates how the loss of axonal mitochondria and elevated levels of eIF2β contribute to autophagy collapse and neuronal dysfunction. The use of Drosophila as a model organism and comprehensive proteome analysis adds robustness to the findings.

      In this revision, the authors have responded thoughtfully to previous concerns. In particular, they have addressed the need for a quantitative analysis of age-dependent changes in eIF2β and eIF2α. By adding western blot data from multiple time points (7 to 63 days), they show that eIF2β levels gradually increase until middle age, then decline. In milton knockdown flies, this pattern appears shifted, supporting the idea that mitochondrial defects may accelerate aging-related molecular changes. These additions clarify the temporal dynamics of eIF2β and improve the overall interpretation.

      Other updates include appropriate corrections to figures and quantification methods. The authors have also revised some of their earlier mechanistic claims, presenting a more cautious interpretation of their findings.

      Overall, this work provides new insights into how mitochondrial transport defects may influence aging-related proteostasis through eIF2β. The manuscript is now more convincing, and the revisions address the main points raised earlier. I find the updated version much improved.

    3. Reviewer #2 (Public review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria, which they suggest accelerates age-dependent changes rather than increasing their magnitude.

      Strong caution is necessary regarding the interpretation of translational regulation resulting from the milton KD. The effect of milton KD on translation appears subtle, if present at all, in the puromycin incorporation experiments in both the initial and revised versions. Additionally, the polysome profiling data in the revised manuscript lack the clear resolution for ribosomal subunits, monosomes, and polysomes that is typically expected in publications.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging.

      Thank you so much for your review and comments.

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion).

      Thank you so much for your review and comments. As the reviewer pointed out, LC3-II/LC3- I ratio changes do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, Figure 8C, Figure 9C), these results collectively suggest that autophagy is lowered.

      As the reviewer pointed out and we described in v2, milton knockdown, eIF2β overexpression and heterozygosity increase LC3-I abundance. We do not know how these conditions increase LC3-I at this moment. We will investigate the cause of the increase in LC3-I by milton knockdown and how it contribute to impaired autophagy. We added this discussion as:

      Lines 388-393; ‘Our results also suggest that milton knockdown and overexpression of eIF2β affect autophagy via increased LC3-I abundance (Figures 2 and 7), suggesting an unconventional mechanism of autophagy suppression. To our knowledge, the roles of eIF2β in aging and autophagy independent of ISR have not been reported. Our results revealed a novel function of eIF2β to maintain proteostasis during aging, while further investigation is required to elucidate underlying mechanisms.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for pointing it out. Plots of the 21-day-old proteome results was included in the main figure (Figure 4C) in v2. In this revision, we further analyzed age-dependent changes of eIF2β levels by western blotting (Figure 4G). We found that eIF2β levels increased during aging until 49-day-old then reduced at 63-day-old (Figure 4G in the revised manuscript). At the young age, eIF2β levels were higher in milton knockdown brain compared to the control , and eIF2β levels were lower in milton knockdown brains than those in the control. These results suggest that milton knockdown accelerates age-dependent changes in eIF2β. We added these results and discussion in the revised manuscript.

      Lines 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      Lines 363-368: ‘We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude.’Our new data indicate that eIF2β levels increase during aging in control flies until 49-day-old, then reduce at 63-day-old (included as Figure 4G in the revised manuscript). These age- dependent changes might explain the reduction in eIF2β levels in Milton knockdown compared to the control in middle age: higher eIF2β levels in milton knockdown flies at a young age than control and lower eIF2β levels in the middle-aged flies may reflect premature aging.

      We included these sentences in the discussion section:

      Lines 240-243:‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day-old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’


      With our new data, we revised some of our responses to the first round of reviewer’s comments.

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your review and comments. We included analyses of protein levels of eIF2α, eIF2β, and eIF2γ at 7 days and 21 days (Figure 4D). The manuscript was revised as below;

      Lines 246-249 ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      NEW TEXT: We analyzed age-dependent changes of eIF2β levels in more detail by western blotting (Figure 4G). We found that eIF2β levels increased during aging until 49-day-old then reduced at 63-day-old (Figure 4G in the revised manuscript). At the young age, eIF2β levels were higher in milton knockdown brain compared to the control , and eIF2β levels were lower in milton knockdown brains than those in the control. These results suggest that Milton knockdown accelerates age-dependent changes in eIF2β.. We added these results and discussion in the revised manuscript.

      NEW TEXT: Lines 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      NEW TEXT: Lines 363-368: ‘We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude.’

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      Thank you for pointing it out, and we apologize for an insufficient description of the result. We included quantitation of the levels of LC3-I and LC3-II in Figures 2A, 2D, 3D, 7B (Figure 6B in the previous version), and 8B (Figure 7B in the previous version). As the reviewer pointed out, LC3-II/LC3-I ratio changes do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, 7C (Figure 6C in the previous version), 8C (Figure 7C in the previous version)), these results collectively suggest that autophagy is lowered. We revised the manuscript to include this discussion as below:

      Lines 174-186 ‘During autophagy progression, LC3 is conjugated with phosphatidylethanolamine to form LC3-II, which localizes to isolation membranes and autophagosomes. LC3-I accumulation occurs when autophagosome formation is impaired, and LC3-II accumulation is associated with lysosomal defects31,32. p62 is an autophagy substrate, and its accumulation suggests autophagic defects31,32. We found that milton knockdown increased LC3-I, and the LC3-II/LC3-I ratio was lower in milton knockdown flies than in control flies at 14-day-old (Figure 2A). We also analyzed p62 levels in head lysates sequentially extracted using detergents with different stringencies (1% Triton X-100 and 2% SDS). Western blotting revealed that p62 levels were increased in the brains of 14-day-old of milton knockdown flies (Figure 2B). The increase in the p62 level was significant in the Triton X-100- soluble fraction but not in the SDS-soluble fraction (Figure 2B), suggesting that depletion of axonal mitochondria impairs the degradation of less-aggregated proteins.’

      Line 189-190: 'At 30 day-old, LC3-I was still higher, and the LC3-II/LC3-I ratio was lower, in milton knockdown compared to the control (Figure 2D).’

      Line 202-203: ‘However, in contrast with milton knockdown, Pfk knockdown did not affect the levels of LC3-I, LC3-II or the LC3-II/LC3-I ratio (Figure 3D).’

      Line 279-285: ‘Neuronal overexpression of eIF2β increased LC3-II, while the LC3-II/LC3-I ratio was not significantly different (Figure 7A and B). Overexpression of eIF2β significantly increased the p62 level in the Triton X-100-soluble fraction (Figure 7C, 4-fold vs. control, p <0.005 (1% Triton X-100)) but not in the SDS-soluble fraction (Figure 7C, 2-fold vs. control, p\= 0.062 (2% SDS)), as observed in brains of milton knockdown flies (Figure 2B). These data suggest that neuronal overexpression of eIF2β accumulates autophagic substrates.’

      Line 311-319: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 8B). eIF2β heterozygosity decreased the p62 level in the Triton X- 100-soluble fraction in the brains of milton knockdown flies (Figure 8C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 8C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      NEW TEXT: Thank you for pointing it out. We included plots of the 21-day-old proteome results as a part of the main figure (Figure 4C). As the reviewer pointed out, eIF2β protein levels are lower in milton knockdown background at the 21-day-old compared to the control. Since a reduction in the eIF2_β_ ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7D), the reduction in eIF2β observed in the 21-day-old milton knockdown flies is not likely to negatively contribute to milton knockdown-induced defects. Our new data indicate that eIF2β levels increase during aging in control flies until 49-day-old, then reduce at 63-day-old (included as Figure 4G in the revised manuscript). These age-dependent changes might explain the reduction in eIF2β levels in Milton knockdown compared to the control in middle age: higher eIF2β levels in milton knockdown flies at a young age than control and lower eIF2β levels in the middle-aged flies may reflect premature aging.

      NEW TEXT: We included these sentences in the discussion section:

      NEW TEXT: Lines 240-243:‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.’

      NEW TEXT: Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day-old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4D and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation.

      We are sorry for our insufficient explanation in the previous version. As the reviewer pointed out, it is well known that the phosphorylated form of eIF2α inhibits translation initiation. Neuronal knockdown of milton caused a reduction in p-eIF2α (Figure 5D and E (Figure 4J and K in the previous version)), and it also lowered translation (Figure 6 (Figure 5 in the previous version)); the relationship between these two events is currently unclear. We do not think that a reduction in the p-eIF2α suppressed translation; rather, we propose that the unbalance of expression levels of the components of eIF2 complexes negatively affects translation. We revised discussion sections to describe our interpretation more in detail as below:

      Line 374-384: ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes39,40. Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 5). However, we also found that global translation was reduced (Figure 6). Increased levels of eIF2β might disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 7).’We have revised the graphical abstract and removed the eIF2 complex since its role in the loss of proteostasis caused by milton knockdown has not been elucidated yet.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%.

      Thank you for pointing it out. It was a mistake of 10-50%, and we apologize for the oversight. It was corrected (Figure 6 (Figure 5 in the previous version)).

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript.

      Thank you for pointing it out. We revised the graph (Figure 6 (Figure 5 in the previous version)).

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. We agree that it would be an interesting experiment, but it will take a considerable amount of time to analyze axonal translation with spatial resolution. We will try to include such analyses in the future. For this manuscript, we revised the discussion section to include the reviewer's suggestion as below;

      Lines 355-357: ‘Further analyses to dissect the effects of milton knockdown on proteostasis and translation in the cell body and axon by experiments with spatial resolution would be needed.’

      Recommendations for the authors:

      From the Reviewing Editor:

      As the Reviewing Editor, I have read your manuscript and the associated peer reviews. I have concerns about publishing this work in its current form. I think that your manuscript cannot claim to have found a novel function of eIF2beta because of technical uncertainties and conceptual problems that should be addressed.

      Thank you so much for your review and comments. We addressed all the concerns raised by the reviewers. Point-by-point responses are listed below.

      First, your manuscript is based partly on what appears to be a mistaken understanding of the mechanistic basis of the ISR. Specifically, eIF2 is a heterotrimeric complex of alpha, beta, and gamma subunits. When eIF2a is phosphorylated, the heterotrimer adopts a new conformation. This conformation directly binds and inhibits eIF2B, the decameric GEF that exchanges the GDP bound to the gamma subunit of the eIF2 complex for GTP. Unless I misunderstood your paper, you seem to propose that decreasing levels of phospho-eIF2a will inhibit translation, but this is backward from what we know about the ISR.

      Thank you for your insightful comment, and we are sorry for the confusion. We did not mean to propose that decreasing levels of phospho-eIF2_a_ inhibits translation. We apologize for our insufficient explanation, which might have caused a misunderstanding (Lines 312-318 in the original version). We agree with the reviewer that ‘mismatch due to elevated eIF2-beta could change the behavior of the ISR’. We revised the text in the result section as follows:

      Lines 263-268 (in the Result section) ‘Phosphorylation of eIF2α induces conformational changes in the eIF2 complex and inhibits global translation36. To analyze the effects of milton knockdown on translation, we performed polysome gradient centrifugation to examine the level of ribosome binding to mRNA. Since p-eIF2α was downregulated, we hypothesized that milton knockdown would enhance translation. However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 6A and B).’

      Lines 374-384 (in the Discussion section): ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes39,40. Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 5). However, we also found that global translation was reduced (Figure 6). Increased levels of eIF2β might disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 7).’

      It may be possible that a stoichiometric mismatch due to elevated eIF2-beta could change the behavior of the ISR, but your paper doesn't adequately address the expression levels of all three eIF2 subunits: alpha, beta, and gamma. The proteomic data shown in Fig 4B is unconvincing on its own because the changes in the beta subunit are subtle. The Western blot in Figure 4C suggests that the KD changes the mass or mobility of the beta subunit, and most importantly, there are no Western blots measuring the levels of eIF2a, eIF2a-phospho, or eIF2-gamma.

      We appreciate the reviewer’s comment and agree that the stoichiometric mismatch due to elevated eIF2β may interfere with ISR. We found overexpression of eIF2β lowered p-eIF2 alpha (Figure S2 in V1), which supports this model. We included this data in the main figure in the revised manuscript (Figure 7D) and revised the text as below:

      Lines 286-289: ‘Since milton knockdown reduced the p-eIF2α level (Figure 5E), we asked whether an increase in eIF2β affects p-eIF2α. Neuronal overexpression of eIF2β did not affect the eIF2α level but significantly decreased the p-eIF2α level (Figure 7D and E).’

      Expression data of eIF2α and eIF2γ from proteomic analyses has been extracted from proteome analyses and included as a table (Figure 4D). Western blots of phospho-eIF2a (Figure S1 in V1) in the main figure (Figure 5B). The result section was revised as below;

      Lines 246-249: ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      NEW TEXT: We also analyzed age-dependent changes of eIF2β by western blotting and found that eIF2β increased during aging until 49-day-old. We included this result as Figure 4G and added these sentences in the result section:

      NEW TEXT: Line 240-243: ‘We also investigated age-dependent changes in eIF2β by western blotting of control flies at 7-, 21-, 35-, and 49-, and 63-day-old. eIF2β levels increased during aging until 49-day-old (Figure 4G). These results suggest that upregulation of eIF2β in milton knockdown fly brain reflects early an onset of age-dependent increase of eIF2β levels.

      Reviewer #1 (Recommendations For The Authors):

      L125-128: In this section, while the efficiency of Milton knockdown is referenced from a previous publication, it is necessary to also mention that the Miro knockdown has been similarly reported in the literature. Additionally, the Methods section lacks details on the Miro RNAi line used, and Table 2 does not include the genotype for Miro RNAi. This information should be included for clarity and completeness.

      Thank you for pointing it out. Knockdown efficiency with this strain has been reported (Iijima- Ando et al., PLoS Genet, 2012). We revised the text to include citation and knockdown efficiency as follows:

      Lines 136-147: ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1). We also analyzed the effect of the neuronal knockdown of Miro, a partner of milton, on the accumulation of ubiquitin-positive proteins. Since severe knockdown of Miro in neurons causes lethality, we used UAS-Miro RNAi strain with low knockdown efficiency, whose expression driven by elav-GAL4 caused 30% reduction of Miro mRNA in head extract24. Although there was a tendency for increased ubiquitin- positive puncta in Miro knockdown brains, the difference was not significant (Figure 1B, p>0.05 between control RNAi and Miro RNAi). These data suggest that the depletion of axonal mitochondria induced by milton knockdown leads to the accumulation of ubiquitinated proteins before neurodegeneration occurs.’

      L132-L136: The current phrasing in this section suggests an increase in ubiquitinated proteins for both Milton and Miro knockdowns. However, since there is no significant difference noted for Miro, it is incorrect to state an increase in ubiquitin-positive puncta. Furthermore, combining the results of Milton knockdown to claim an increase in ubiquitinated proteins prior to neurodegeneration is misleading. At the very least, the expression here needs to be moderated to accurately reflect the findings.

      Thank you for pointing it out. We revised the text as above.

      L137-L141: Results in Figure 1 indicate that Milton knockdown leads to an increase in ubiquitinated proteins at 14 days, while Miro knockdown shows no difference from the control at either 14 or 30 days. Conversely, both the control and Miro exhibit an increase in ubiquitinated proteins with aging, but this trend does not seem to apply to Milton knockdown. This observation suggests that Milton KD may not affect the changes in protein quality control associated with aging. It implies that Milton's function might be more related to protein homeostasis in younger cells, or that changes due to aging might overshadow the effects of Milton knockdown. These interpretations should be included in the Results or Discussion sections for a more comprehensive analysis.

      NEW TEXT: Thank you for your insightful comment. As you mentioned, the accumulation of ubiquitinated proteins significantly increases only in young flies. Age-related pathways, such as immune responses, are highlighted in young milton knockdown flies but not in the aged flies. Our new result indicates that eIF2β increases during aging in control flies (included as Figure 4G in the revised manuscript), and upregulation of eIF2β in milton knockdown is only observed at a young age. These results suggest that milton knockdown does not increase the magnitude of age-dependent changes but accelerates their onset. We revised the text to include those points as follows:

      NEW TEXT: Lines 152-153: ‘These results suggest that depletion of axonal mitochondria may have more impact on proteostasis in young neurons than in old neurons.’

      NEW TEXT: Lines 359-371: ‘Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. The number of puncta of ubiquitinated proteins was higher in milton knockdown at 14-day-old, but there was no significant difference at 30-day- old (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). We also found that eIF2β protein levels increase in an age-dependent manner until 49-day-old and reduces after that (Figure 4G). In the brains with neuronal knockdown of milton, eIF2β levels were higher at 7-day-old than those in control and lower at the 21-day-old (Figure 4 and Supplementary table). These results suggest that milton knockdown is likely accelerating age-dependent changes rather than increasing their magnitude. Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      L143 : Please remove the erroneously included quotation mark.

      Thank you for pointing it out. We corrected it.

      L145-L147:

      While it is understood that Milton knockdown results in a reduction of mitochondria in axons, as reported previously and seemingly indicated in Figure 1E, this paper repeatedly refers to axonal depletion of mitochondria. Therefore, it would be beneficial to quantitatively assess the number of mitochondria in the axonal terminals located in the lamina via electron microscopy. Such quantification would robustly reinforce the argument that mitochondrial absence in axons is a consequence of Milton knockdown.

      Thank you for pointing it out. We included quantitation of the number of mitochondria in the synaptic terminals (Figure 1E).

      The text and figure legend was revised accordingly:

      Lines 156-157: ‘As previously reported24, the number of mitochondria in presynaptic terminals decreased in milton knockdown (Figure 1E).’

      The knockdown of Milton is known to reduce mitochondrial transport from an early stage, but what about swelling? By observing swelling at 1 day and 14 days, it may be possible to confirm the onset of swelling and discuss its correlation with the accumulation of ubiquitinated proteins.

      Quantitation of axonal swelling has also been included (Figure 1F).

      We appreciate the reviewer's comments on the correlation between the accumulation of ubiquitinated proteins and axonal swelling. Axonal swelling was not observed at 3-days-old (Iijima-Ando et al., PLoS Genetics, 2012), indicating that axonal swelling is an age-dependent event. Dense materials are found in swollen axons more often than in normal axons, suggesting a positive correlation between disruption of proteostasis and axonal damage. It would be interesting to analyze the time course of events further; however, we feel it is beyond the scope of this manuscript. We revised the text to include this discussion as:

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old24 but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 162-167: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H). In milton knockdown neurons, dense materials are found in swollen presynaptic terminals more often than in presynaptic terminals without swelling, suggesting a positive correlation between the disruption of proteostasis and axonal damage (Figure 1G).’

      Lines 369-371: ‘Disruption of proteostasis is expected to contribute neurodegeneration38 , and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown (24,29 and Figure 1) in detail with higher time resolution.’

      L147-L151: Though Figures 1F and 1G provide qualitative representations, it is advisable to quantitatively assess whether dense materials significantly accumulate. Such quantitative analysis would be required to verify the accumulation of dense materials in the context of the study.

      Thank you for pointing it out. We included quantitation of the number of neurons with dense material (Figure 1G). We revised the manuscript as follows:

      Line 162-164: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H).’

      Regarding Figure 1B, C:

      Even though the count of puncta in the whole brain appears to be fewer than 400, the magnification of the optic lobe suggests a substantial presence of puncta. Please clarify in the Methods section what constitutes a puncta and whether the quantification in the whole brain is based on a 2D or 3D analysis. Detail the methodology used for quantification.

      Thank you for your comment. We revised the method section to include more details as below:

      Lines 440-443: ‘Quantitative analysis was performed using ImageJ (National Institutes of Health) with maximum projection images derived from Z-stack images acquired with same settings. Puncta was identified with mean intensity and area using ImageJ.’

      What about 1-day-old specimens? Does Milton knockdown already show an increase in ubiquitinated protein accumulation at this early stage? Investigating whether ubiquitin-protein accumulation is involved in aging promotion or is already prevalent during developmental stages is a necessary experiment.

      Thank you for your comment. We carried out immunostaining with an anti-ubiquitin antibody in the brains at 1-day-old. No significant difference was detected between the control and milton knockdown. This result has been included as Figure S1 in the revised manuscript. The result section was revised as below:

      Line 136-139 ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1).’

      For Figure 1E: In the Electron Microscopy section of the Methods, define how swollen axons were identified and describe the quantification methodology used.

      Thank you for your comment. Swollen axons are, unlike normal axons, round in shape and enlarged. We revised the text as below;

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old24 but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 689-691, Figure 1 legend: ‘Swollen presynaptic terminals (asterisks in (F)), characterized by the enlargement and higher circularity, were found more frequently in milton knockdown neurons.’

      L218-L219: Throughout the text, the expression 'eIF2β is "upregulated" in response to Milton knockdown' is frequently used. However, considering the presented results, it might be more accurate to interpret that under the condition of Milton knockdown, eIF2β is not undergoing degradation but rather remains stable.

      Thank you for pointing it out. We replaced ‘upregulated’ with ‘increased’ throughout the text.

      L234-L235: On what basis is the conclusion drawn that there is a reduction? Given that three experiments have been conducted, it would be possible and more convincing to quantify the results to determine if there is a significant decrease.

      Thank you for pointing it out. We quantified the AUC of polysome fraction and carried out a statistical analysis. There is a significant decrease in polysome in milton knockdown, and this result has been included in Figure 5B. We revised the figure and the legend accordingly.

      L236: 5H-> 4H

      Thank you for pointing it out, and we are sorry for the confusion. We corrected it.

      L238-L239: Since there is no significant difference observed, it may not be accurate to interpret a reduction in puromycin incorporation.

      Thank you for pointing it out. As described above, quantification of polysome fractions showed that milton knockdown significantly reduced polysome (Figure 6B (Figure 5B in the previous version)). We revised the manuscript as below;

      Lines 267-268: ‘However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 6A and B).’

      Figure 5D and Figure 6D: Climbing assays have been conducted, but I believe experiments should also be performed to examine whether overexpression or heterozygous mutants of eIF2β induce or suppress degeneration.

      Thank you for pointing it out. We analyzed the eyes with eIF2β overexpression for neurodegeneration. Although there was a tendency of elevated neurodegeneration in the retina with eIF2β overexpression, the difference between control and eIF2β overexpression did not reach statistical significance (Figure S2). This result has been included as Figure S2 in the revised manuscript, and the following sentences have been included in the text:

      Lines 292-297: ‘We asked if eIF2β overexpression causes neurodegeneration, as depletion of axonal mitochondria in the photoreceptor neurons causes axon degeneration in an age- dependent manner24. eIF2β overexpression in photoreceptor neurons tends to increase neurodegeneration in aged flies, while it was not statistically significant (p>0.05, Figure S2).’

      L271-L272: The results in Figure 6B are surprising. I anticipated a greater increase compared to the Milton knockdown alone. While p62 appears to be reduced, it is not clear why these results lead to the conclusion that lowering eIF2β rescues autophagic impairment. Please add a discussion section to address this point.

      Thank you for pointing it out. We apologize for the unclear description of the result. Milton knockdown flies show p62 accumulation (Figure 2), and deleting one copy of eIF2beta in milton knockdown background reduced p62 accumulation (Figure 8C (Figure 7C in the previous version)). We revised the text as below:

      Lines 311-319: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 8B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 8C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 8C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      L369: Please specify the source of the anti-ubiquitin antibody used.

      Thank you for pointing it out. We included the antibody information in the method section.

      Figure 7: While the relationship between Milton knockdown and the eIF2β and eIF2α proteins has been elucidated through the authors' efforts, I would like to see an investigation into whether eIF2β is upregulated and eIF2α phosphorylation is reduced in simply aged Drosophila. This would help us understand the correlation between aging and eIF2 protein dynamics.

      Thank you for your comment. We agree that it is an important question, and we are working on it. However, we feel that it is beyond the scope of the current manuscript.

      L645-L646: If the mushroom body is identified using mito-GFP, then include mito-GFP in the genotype listed in Supplementary Table 2.

      We are sorry for the oversight. We corrected it in Supplementary Table 2.

      Additionally, while it is presumed that the mito-GFP signal decreases in axons with Milton RNAi, how was the lobe tips area accurately selected for analysis? Please include these details along with a comprehensive description of the quantification methodology in the Methods section.

      Thank you for your comment. Although the mito-GFP signal in the axon is weak in the milton knockdown neurons, it is sufficient to distinguish the mushroom body structure from the background. We revised the method section to include this information in the method section:

      Line 443-447: ‘For eIF2α and p-eIF2α immunostaining, the mushroom body was detected by mitoGFP expression.’

    1. eLife Assessment

      This study provides valuable results on how entorhinal and hippocampal activity may support human thinking in perceptual spaces. It replicates the hexagonal symmetry of fMRI activity in the entorhinal cortex, reports novel findings on 3-fold symmetry in both behavioral performance and hippocampal fMRI activity, and links these results within a computational model. However, the methods while potentially creative and interesting are not fully justified or explained, and the conclusions remain incomplete. With further explanation, justification, and interpretation, this work could represent a significant step forward in understanding how cognitive maps are utilized.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in the entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors claim to identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Finally, the authors propose an EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. While the wide array of techniques used is impressive and their creativity in analysis is admirable, overall, I found the paper a bit confusing and unconvincing. I recommend a significant rewrite of their paper to motivate their methods and clarify what they actually did and why. The claim of three-fold modulation in HC, while potentially highly interesting to the community, needs more background to motivate why they did the analysis in the first place, more interpretation as to why this would emerge in biology, and more care taken to consider alternative hypotheses seeped in existing models of HC function. I think this paper does have potential to be interesting and impactful, but I would like to see these issues improved first.

      General comments:

      (1) Some of the terminology used does not match the terminology used in previous relevant literature (e.g., sinusoidal analysis, 1D directional domain).

      (2) Throughout the paper, novel methods and ideas are introduced without adequate explanation (e.g., the spectral analysis and three-fold periodicity of HC).

    3. Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses are thoroughly done, and the results and simulations are very interesting.

    4. Author response:

      Reviewer #1, Comment (1): Terminology

      We fully acknowledge the importance of terminological consistency and will align our usage with established literature. Specifically, we will revise as follows, 

      (1) Replace “sinusoidal analysis” with either “sinusoidal modulation” (Doeller et al., 2010; Bao et al., 2019; Raithel et al., 2023) or “GLM with sinusoidal (cos/sin) regressors” (Constantinescu et al., 2016). 

      (2) Replace “1D directional domain” with either “angular domain of movement directions (0–360°)” or “directional modulation analysis”.

      Reviewer #1, Comment (2): Spectral analysis and 3-fold periodicity

      We agree that the presentation of our spectral analysis and the theoretical motivation underlying our expectation of a three-fold periodicity within hippocampal data requires further clarification.

      In our revised manuscript, we will:<br /> (1) Clearly articulate the theoretical motivation for anticipating a three-fold signal, explicitly linking it to the known hexagonal grid structure encoded by the entorhinal cortex.

      (2) Clarify our methodological rationale for using Fourier analysis (FFT).

      a) FFT allows unbiased exploration of multiple candidate periodicities (e.g., 3–7-fold) without predefined assumptions.

      b) FFT results cross-validate our sinusoidal modulation results, providing complementary evidence supporting the 6-fold periodicity in EC and 3-fold periodicity in HPC.

      c) FFT uniquely facilitates analysis of periodicities in behavioral performance data, which is not feasible via standard sinusoidal GLM approaches. This consistency allows us to directly compare periodicities across neural and behavioral data.

      (3) Further, we will expand our discussion to provide:

      a) A deeper interpretation of potential biological bases for the observed hippocampal three-fold periodicity.

      b) A careful examination of alternative explanations within existing hippocampal modeling frameworks.

      Reference:

      Doeller, C. F., Barry, C., & Burgess, N. (2010). Evidence for grid cells in a human memory network. Nature, 463(7281), 657-661.

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468.

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like neural representations support olfactory navigation of a two-dimensional odor space. Neuron, 102(5), 1066-1075.

      Raithel, C. U., Miller, A. J., Epstein, R. A., Kahnt, T., & Gottfried, J. A. (2023). Recruitment of grid-like responses in human entorhinal and piriform cortices by odor landmark-based navigation. Current Biology, 33(17), 3561-3570

    1. eLife Assessment

      By combining the 'pinging' technique with fMRI-based multivariate pattern analysis, this important study provides compelling evidence for a dual-format representation of attention during the preparatory period. The findings help reconcile the debate between sensory-like and non-sensory accounts of attentional templates and shed light on how the brain flexibly deploys different forms of templates to guide attention. This work will be of broad interest to researchers in psychology, vision science, and cognitive neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of the experiment reported in this paper is to examine the nature of the representation of a template of an upcoming target. To this end, participants were presented with compound gratings (consisting of tilted to the right and tilted to the left lines) and were cued to a particular orientation - red left tilt or blue right tilt (counterbalanced across participants). There two directly compared conditions: (i) no ping: where there was a cue, that was followed by a 5.5-7.5s delay, then followed by a target grating in which the cued orientation deviated from the standard 45 degrees; and (ii) ping condition in which all aspects were the same with the only difference that a ping (visual impulse presented for 100ms) was presented after the 2.5 seconds following the cue. There was also a perception task in which only the 45 degrees to the right or to the left lines were presented. It was observed that during the delay, only in the ping condition, were the authors able to decode orientation of the to be reported target using the cross-task generalization. Attention decoding, on the other hand, was decoded in both ping and non-ping conditions. It is concluded that the visual system has two different functional states associated with a template during preparation: a predominantly non-sensory representation for guidance and a latent sensory-like for prospective stimulus processing.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative -the cross-task decoding, the use of Mahalanobis distance as a function of representational similarity, the fact that the question is theoretically interesting, the excellent figures.

      Comments on revisions:

      I have no further comments.

    3. Reviewer #3 (Public review):

      This paper discusses how non-sensory and latent, sensory-like attentional templates are represented during attentional preparation. Using multivariate pattern analysis, they found that visual impulses can enhance the decoding generalization from perception to attention tasks in the preparatory stage in the visual cortex. Furthermore, the emergence of the sensory-like template coincided with enhanced information connectivity between V1 and frontoparietal areas and was associated with improved behavioral performance. It is an interesting paper with supporting evidence for the latent, sensory-like attentional template.

      Comments on revisions:

      I appreciate the authors' thoughtful revisions, which have addressed my earlier concerns. I have no further comments.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am impressed with the thoroughness with which the authors addressed my concerns. I don't have any further concerns and think that this paper makes an interesting and significant contribution to our understanding of VWM. I would only suggest adding citations to the newly added paragraph where the authors state "It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance." They could cite work by Bettencourt and Xu, 2016; and Sheremata, Somers, and Shomstein (2018).

      We thank the reviewer for the positive feedback. We have now cited the referenced work in the manuscript (Page. 19, Line 371).

      Reviewer #2 (Public review):

      Overall, I think that the authors' revision has addressed most, if not all, of my major concerns noted in my previous comments. The results appear convincing and I do not have additional comments.

      We thank the reviewer for the positive feedback and are pleased that the revision addressed the major concerns.

      Reviewer #3 (Public review):

      (1) The authors addressed most of my previous concerns and provided additional data analysis. They conducted further analyses to demonstrate that the observed changes in network communication are associated with behavioral RTs, supporting the idea that the impulse-driven sensory-like template enhances informational connectivity between sensory and frontoparietal areas, and relates to behavior.

      We are pleased that the revision addressed the major concerns.

      (2) I would like to further clarify my previous points regarding the definition of the two types of templates and the evidence for their coexistence. The authors stated that the sensory-like template likely existed in a latent state and was reactivated by visual pings, proposing that sensory and non-sensory templates coexist. However, it remains unclear whether this reflects a dynamic switch between formats or true coexistence. If the templates are non-sensory in nature, what exactly do they represent? Are they meant to be abstract or conceptual representations, or, put simply, just "top-down attentional information"? If so, why did the generalization analysestraining classifiers on activity during the stimulus selection period and testing on preparatory activity-fail to yield significant results? While the stimulus selection period necessarily encodes both target and distractor information, it should still contain attentional information. I would appreciate more discussion from this perspective.

      We thank the reviewer for the helpful clarification of previous comments. Since we addressed similar comments from Reviewer 2 (Point 2) in the previous round, our response below may appear somewhat repetitive. First, regarding whether our findings reflect a dynamic switch between non-sensory and sensory-like template, or the ‘coexistence’ of two template formats, we acknowledge that the temporal limitations of fMRI prevent us from directly testing dynamic representations. However, several aspects of our data favor the latter interpretation: (1) our key findings remained consistent in the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. This makes it unlikely that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the taskirrelevant, uninformative visual impulse; (2) while we agree that the temporal dynamics between the two templates remain unclear, it is difficult to imagine that orientation-specific templates observed in the Ping session emerged de novo from purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? A more parsimonious explanation is that orientation information was already present in a latent format and was activated by the ping, in line with the models of “activity-silent” working memory. However, since the detailed circuit-level mechanism underlying such reactivation remain unclear, we acknowledge that this interpretation warrants direct investigation in future studies. This point is discussed in the main texts (Page 19-20, Line 389-402). 

      Second, while our data cannot definitively determine the nature of the non-sensory template, we consider categorical coding a plausible candidate based on prior visual search studies. For instance, categorical attributes (e.g., left-tilted vs. right-tilted) have been shown to effectively guide attention in orientation search tasks (Wolfe et al., 1992), similar to our paradigm. Further, categorical templates are more tolerant of stimulus variability, making them well-suited to our task, which involved trial-by-trial variations in target orientation around a reference (see Page 21, Line 427- 437 for more detailed discussions).

      Third, the lack of generalization from stimulus selection to preparatory attention in the Ping session may relate to the limited overlap in shared information between these two periods. Neural activity during stimulus selection encodes sensory information about both orientations, along with sensory-like attentional signals (as indicated by the attention decoding and crosstask generalization from perception task to the stimulus-selection period). In contrast, preparatory activity likely involves a dominant non-sensory template, a latent sensory-like template, and residual sensory effects from the impulse stimulus. The limited overlap in sensory-like attentional signals may therefore be insufficient to support generalization across the two periods.

      Reviewer #2 ( Recommendations for the authors)

      I think the central prediction of greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session (the same also holds for 'attend rightward' and 'perceived rightward' could be directly examined by a two-way ANOVA (session × the attend orientation is the same/different from the perceived orientation) for each ROI (V1 and EVC). A three-way ANOVA might complicate readers' intuitive understanding of the implications of the statistical results.

      We thank the reviewer for the suggestion. Following the reviewer’s suggestion, we defined a new condition label based on orientation consistency between attended and perceived orientations: (1) same orientation: averaging “attend leftward/perceive leftward” and “attend rightward/perceive rightward”; and (2) different orientation: averaging “attend leftward/perceive rightward” and “attend rightward/perceive leftward”. A two-way mixed ANOVA (session × orientation consistency) on Mahalanobis distance revealed a main effect of orientation consistency in V1 (F(1,38) = 4.21, p = 0.047, η<sub>p</sub><sup>2</sup> = 0.100), indicating that activity patterns were more similar when attended and perceived orientations matched. No significant main effect of session was found (p = 0.923). Importantly, a significant interaction was found in V1 (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116), suggesting that visual impulse enhanced the similarity between preparatory attentional template and the perception of corresponding orientation. In EVC, the same analysis revealed only a main effect of orientation consistency (F(1,38) = 5.87, p = 0.020, η<sub>p</sub><sup>2</sup> = 0.134), with no significant other effects (ps > 0.240). The interaction results were consistent with those reported in the original three-way ANOVA. We have now replaced the previous analysis with the new one in the main texts (Page 11-12, Line 231-242).

    1. eLife Assessment

      The article presents important findings on the impact of climate change on odonates, integrating phenological and range shifts to broaden our understanding of biodiversity change. The study leverages extensive natural history data, offering a convincing analysis of temporal trends in phenology and range limit and their potential drivers.

    2. Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other (or neither), and that geographic context and temperature variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The results also seem to support climate vulnerability assessments for species that rely on geographic range size and geospatial climate data layers rather than more detailed information (like demographic rates, abundances, or traits) that may not be so readily available. The methodology would be useful for other taxa and study regions with strong participatory ("citizen") science and extensive occurrence data.

      Strengths:

      This is an organized and well written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

    3. Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      Comments on revised version:

      The revision has substantially improved the paper.

    4. Reviewer #3 (Public review):

      Summary:

      In their article "Range geography and temperature variability explain cross-continental convergence in range and phenology shifts in a model insect taxon" the authors rigorously investigate the spatial and temporal trends in the occurrence of odonate species and their potential drivers. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to cope with changing conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited shifts to earlier emergence. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insects declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings and informs about the drivers of biodiversity loss.

      Weaknesses:

      To better understand whether species shifting both their ranges and phenology are more successful, or as stated here are 'clear winners', and hence whether those that do neither are more vulnerable would require integrating population trends alongside the discussed response. The ~10% species that have not shifted their distribution or phenology might have not declined in abundance, if they have rapidly adapted to local changes in climatic conditions (i.e. they might show a plastic response). These species might be the real 'winners', while species that have recently shifted their ranges or phenology may eventually reach hard limits. The authors are discussing this limitation but might want to adapt their wording, given the potential for misinterpretation. The finding that species with more northern ranges showed lesser northward shifts would speak to the fact that some species have already reached such a geographical range limit.

      Achievements and impact:

      The results support broad differences in the response of odonate species to climate change, and the prediction that range geography and temperature seasonality are more important predictors of these changes than functional traits. Simultaneously addressing range and phenological shifts highlights that most species exhibit coupled responses but also identifies a significant portion of species that do not respond in these ways that are of critical conservation concern. These results are important for improving forecasts of species' responses to climate change and identifying species of particularly conservation concern. Although not exhaustive regarding abundance trends, the study presents an important step towards a general framework for investigating the drivers of multifaceted species responses.