10,000 Matching Annotations
  1. Jan 2025
    1. Reviewer #2 (Public review):

      Summary:<br /> Authors introduced a computational framework, DyNoPy, that integrates residue coevolution analysis with molecular dynamics (MD) simulations to identify functionally important residues in proteins. DyNoPy identifies key residues and residue-residue coupling to generate an interaction graph and attempts to validate using two clinically relevant β-lactamases (SHV-1 and PDC-3).

      Strengths:<br /> DyNoPy could not only show clinically relevance of mutations but also predict new potential evolutionary mutations. Authors have provided biologically relevant insights into protein dynamics which can have potential applications in drug discovery and understanding molecular evolution.

      Weaknesses:<br /> Although DyNoPy could show the relevance of key residues in active and non-active site residues, no experiments have been performed to validate their predictions. In addition, they should compare their method with conventional techniques and show how their method could be different.

      An explanation of "communities" divided in the work and how these communities are relevant to the article should be provided. In addition, choice of collective variables and their relevance in residue coupling movement is also not very well explained. Dynamics cross correlation map can also be a good method for understanding the residue movements and can explain the residue-residue coupling, it is not explained how DyNoPy is different from the conventional methods or can perform better.

      In the sentence "DyNoPy identified eight significant communities of strongly coupled residues within SHV-1 (Supporting Fig. S4A)" I could not find a clear description of eight significant communities.

      Again the description of communities is not clear to me in the following sentence "Detailed description of the other three communities is provided in the supporting information (Fig. S6)."

      In the sentence "N170 acts as an intermediary between N136 and E166". Kindly cite the reference figure to show N179 as intermediate residue.

      Please be careful with the numbers. In the sentence "These residues not only interact with each other directly but are also indirectly coupled via 21 other residues." I could count 22 other residues and not 21.

      In the sentence "Unlike other substitution sites that are adjacent to the active site, R205 is situated more than 16 Å away from catalytic serine S70". Please add this label somewhere in the figure.

      Please cite a reference in the sentence "This indicates that mutations on G238 would result in an alteration on protein catalytic function, as well as an increased flexibility of the protein, which strongly aligns with previous finding."

    2. Reviewer #3 (Public review):

      Summary:<br /> In this paper, Xu, Dantu and coworkers report a protocol for analyzing coevolutionary and dynamical information to identify a subset of communities that capture functionally relevant sites in beta-lactamases.

      Strengths:<br /> The combination of coevolutionary information and metrics from MD simulations is interesting for capturing functionally relevant sites, which can have implications in the fields of drug discovery but also in protein design.

      Weaknesses:<br /> The combination of coevolutionary information and metrics from MD simulations is not new as other protocols have been proposed along the years (the current version of the paper neglects some of them, see below), and there are a few parameters of the protocol that, in my opinion, should be better analyzed and discussed.

      (1) As mentioned, the introduction of the paper lacks some important publications in the field of using graph theory to represent important interaction networks extracted from MD simulations (DOI: 10.1002/pro.4911), and also combining MD data with MSA to identify functionally relevant sites for enzyme design (doi: 10.1021/acscatal.4c04587, 10.1093/protein/gzae005).<br /> (2) The matrix used to apply graph theory (J_ij) is built from summing the scaled coevolution and degree of correlation values. The alpha and beta weights are defined, and the authors mention that alpha is set to 0.5, thus beta as well to fulfil with the alpha + beta = 1. Why a value of 0.5 has been selected? How this affects the overall results and conclusions extracted? The finding that many catalytically relevant residues are identified in the communities is not surprising given that such sites usually present a high conservation score.<br /> (3) Another important point that needs further explanation is the selection of the relevant descriptor of protein dynamics. In this study two different strategies have been used (one more global the other more local), but more details should be provided regarding their choice. What is the best strategy according to the authors? Why not using the same strategy for both related systems? The obtained results using one methodology or the other will have a large impact on the dynamical score. Another related point is: what is the impact of the MD simulation length, how the MSA is generated and number of sequences used for MSA construction?

    1. eLife Assessment

      Based on the perceived low efficacy of current therapies targeted to FGFR2 in gastric cancer (GC), the authors investigate an approach which combines SHP2 inhibition with existing FGFR2 inhibitors. The data were largely collected and analysed using solid and validated methodology. There is some useful data regarding combination therapy in a new clinical cohort, which supports previous studies that have reported the potential of targeting RTKs together with phosphatases.

    2. Reviewer #1 (Public review):

      The manuscript entitled "Blocking SHP2 1 benefits FGFR2 inhibitor and overcomes its resistance in 2 FGFR2-amplified gastric cancer" by Zhang, et al., reports that FGFR2 was amplification in 6.2% (10/161) of gastric cancer samples and that dual blocking SHP2 and FGFR2 enhanced the effects of FGFR2 inhibitor (FGFR2i) in FGFR2-amplified GC both in vitro and in vivo via suppressing RAS/ERK and PI3K/AKT pathways. Furthermore, the authors also showed that SHP2 blockade suppressed PD-1 expression and promoted IFN-γ secretion of CD8+ 46 T cells, enhancing the cytotoxic functions of T cells. Thus, the authors concluded that dual blocking SHP2 and FGFR2 is a compelling strategy for treatment of FGFR2-amplified gastric cancer. Although the finding is interesting, the finding that FGFR2 is amplified in gastric cancer and that FGFR inhibitors have some effect on treating gastric cancer is not novel. The data quality is not high, and the effects of double inhibitions are not significant. It appears that the conclusions are largely overstatement, the supporting data is weak and not compelling.

      The data in Figure 1 is not novel, similar data has been reported elsewhere.

      It is unclear why the two panels in Fig 2a and 2b can not be integrated into one panel, which will make it easier to compare the activities.

      The synergetic effects of azd4547 and shp099 are not significant in Fig 2e and 2f, as well as in Fig. 3g and fig. 4f

      Data in Fig. 5 is weak and can be removed. It is unclear why FGFR inhibitor has some activities toward t cells since t cells do not express FGFR.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports the application of a combined targeted therapeutic approach to gastric cancer treatment. The RTK, FGFR2 and the phosphatase, SHP2 are targeted with existing drugs; AZD457 and SHP099 respectively. Having shown increased mRNA levels of FGFR2 and SHP2 in a patient population and highlighted the issue of resistance to single therapies the combination of inhibitors is shown to reduce cancer-related signalling in two gastric cell lines. The efficacy of the dual therapy is further demonstrated in a single patient case study and mouse xenograft models. Finally, the rationale for SHP2 inhibition is shown to be linked to immune response.

      Strengths:

      The data is generally well presented and the study invokes a novel patient data set which could have wider value. The study provides additional evidence to support the combined therapeutic approach of RTK and phosphatase inhibition.

      Weaknesses:

      Combined therapy approaches targeting RTKs and SHP2 have been widely reported. Indeed, SHP099 in combination with FGFR inhibitors has been shown to overcome adaptive resistance in FGFR-driven cancers. Furthermore, the inhibition of SHP2 has been documented to have important implications in both targeting proliferative signalling as well as immune response. Thus, it is difficult to see novelty or a significant scientific advance in this manuscript. Although the data is generally well presented, there is inconsistency in the interpretation of the experimental outcomes from ex vivo, patient and mouse systems investigated. In addition, the study provides only minor or circumstantial understanding of the dual mechanism.

      Using data from a 161 patient cohort FGFR2 was identified as displaying amplification of FGFR2 in ~6% with concomitant elevation of mRNA of patients which correlated with PTPN11 (SHP2) mRNA expression. The broader context of this data is of value and could add a different patient demographic to other data on gastric cancer. However, there is no detail on patient stratification or prior therapeutic intervention.

      In SNU16 and KATOIII cells the combined therapy is shown to be effective and appears to be correlated with increased apoptotic effects (i.e. not immune response).

      Fig 2E suggests that the combined therapy in SNU16 cells is a little better than FGFR2-directed AZD457 inhibitor alone, particularly at the higher dose.

      The individual patient case study described via Fig 3 suggests efficacy of the combined therapy (at very high dosage), however, the cell biopsies only show reduced phosphorylation of ERK, but not AKT. This is at odds with the ex vivo cell-based assays. Thus, it is not clear how relevant this study is.

      The mouse xenograft study shows a convincing reduction in tumor mass/volume and clear reduction in pAKT, whilst pERK remains largely unaffected by the combined therapeutic approach. This is in conflict with the previous data which seems to show the opposite effect. In all, the impact of the dual therapy is unclear with respect to the two pathways mediated by ERK and AKT.

      Finally, the authors demonstrate the impact of SHP2 on PD-1 expression and propose that the SHP099/AZD4547 combination therapy significantly induces the production of IFN-γ in CD8+ T cells. This part of the study is unconvincing and would benefit from the investigation of the tumor micro-environment to assess T cell infiltration.

    4. Reviewer #3 (Public review):

      Summary:

      Fibroblast growth factor receptor 2 (FGFR2) is a receptor tyrosine kinase that can be amplified in gastric cancer and serves as a potential therapeutic target for this patient population. However, targeting FGFR2 has shown limited efficacy. Thus, this study seeks to identify additional molecules that can be effectively targeted in FGFR2 amplified gastric cancer, with a focus on Src homology region 2-containing protein tyrosine phosphatase 2 (SHP2). The authors first demonstrate that 6% of gastric cancer patients in a cohort of human patient samples exhibit FGFR2 amplification. Furthermore, they demonstrate that FGFR2 mRNA expression is positively correlated with PTPN11 gene expression (which is the gene that encodes the SHP2 protein). Using human gastric cancer cell lines with amplified FGFR2, the authors then test the effects of combining the FGFR inhibitor AZD4547 with the SHP2 inhibitor SHP099 on tumor cell death and signaling molecules. They demonstrate that combining the two inhibitors is more effective at tumor cell killing and reducing activation of downstream signaling pathways than either inhibitor alone. In further studies, the authors obtained gastric cancer cells with FGFR2 amplification from a patient that was treated with FGFR2 inhibitor. While this patient initially showed a partial response, the patient ultimately progressed, demonstrating resistance to FGFR2 inhibition. Following isolation of tumor cells from the patient's ascites, the authors demonstrate that these cells are sensitive to the combination treatment of AZD4547 and SHP099. Further studies were performed using a xenograft model using athymic nude mice in which the combination of SHP099 and AZD4547 were found to reduce tumor growth more significantly than either treatment alone. Finally, the authors demonstrate using an in vitro culture model that this combination treatment enhances T cell mediated cytotoxicity. The authors conclude that targeting FGFR2 and SHP2 represents a potential combination strategy in gastric patients with FGFR2 amplification.

      Strengths:

      The authors demonstrate that FGFR2 amplification positively correlates with PTPN11 in human gastric cancer samples, providing rationale for combination therapies. Furthermore, convincing data are provided demonstrating that targeting both FGFR and SHP2 is more effective than targeting either pathway alone using in vitro and in vivo models. The use of cells derived from a gastric cancer patient that progressed following treatment with an FGFR inhibitor is also a strength. The findings from this study support the conclusion that SHP2 inhibitors enhance the efficacy of FGFR-targeted therapies in cancer patients. This study also suggests that targeting SHP2 may also be an effective strategy for targeting cancers that are resistant to FGFR-targeted therapies.

      Weaknesses:

      The main caveat with these studies is the lack of an immune competent model with which to test the finding that this combination therapy enhances T cell cytotoxicity in vivo. Discussing this limitation within the context of these findings and future directions for this work, particularly since the combination therapy appears to work quite well without the presence of T cells in the environment, would be beneficial.

    1. eLife Assessment

      In this manuscript, the authors present useful findings demonstrating that the RNA modification enzyme Mettl5 regulates sleep in Drosophila. Through transcriptome- and proteome-wide analyses, the authors identified downstream targets affected in heterozygous mutants and proposed that Mettl5 regulates the translation and degradation of clock genes to maintain normal sleep function. However, the mechanisms by which Mettl5 achieves these functions, and whether they are direct or indirect, remain incomplete and would benefit from further analysis.

    2. Reviewer #1 (Public review):

      Summary:

      Here the authors attempted to test whether the function of Mettl5 in sleep regulation was conserved in drosophila, and if so, by which molecular mechanisms. To do so they performed sleep analysis, as well as RNA-seq and ribo-seq in order to identify the downstream targets. They found that the loss of one copy of Mettl5 affects sleep and that its catalytic activity is important for this function. Transcriptional and proteomic analyses show that multiple pathways were altered, including the clock signaling pathway and the proteasome. Based on these changes the authors propose that Mettl5 modulate sleep through regulation of the clock genes, both at the level of their production and degradation.

      Strengths:

      The phenotypical consequence of the loss of one copy of Mettl5 on sleep function is clear and well-documented.

      Weaknesses:

      The imaging and molecular parts are less convincing.<br /> - The colocalization of Mettl5 with glial and neuronal cells is not very clear<br /> - The section on gene ontology analysis is long and confusing<br /> - Among all the pathways affected the focus on proteosome sounds like cherry picking. And there is no experiment demonstrating its impact in the Mettl5 phenotype<br /> - The ribo seq shows some changes at the level of translation efficiency but there is no connection with the Mettl5 phenotypes. In other words, how the increased usage of some codons impact clock signalling. Are the genes enriched for these codons?<br /> - A few papers already demonstrated the role of Mettl5 in translation, even at the structural level (Rong et al, Cell reports 2020) and this was not commented by the authors. In Peng et al, 2022 the authors show that the m6A bridges the 18S rRNA with RPL24. Is this conserved in Drosophila?<br /> - The text will require strong editing and the authors should check and review extensively for improvements to the use of English.

      Conclusion

      Despite the effort to identify the underlying molecular defects following the loss of Mettl5 the authors felt short in doing so. Some of the results are over-interpreted and more experiments will be needed to understand how Mettl5 controls the translation of its targets. References to previous works was poorly commented.

    3. Reviewer #2 (Public review):

      Summary:

      The authors define the m6A methyltransferase Mettl5 as a novel sleep-regulatory gene that contributes to specific aspects of Drosophila sleep behaviors (i.e., sleep drive and arousal at early night; sleep homeostasis) and propose the possible implication of Mettl5-dependent clocks in this process. The model was primarily based on the assessment of sleep changes upon genetic/transgenic manipulations of Mettl5 expression (including CRISPR-deletion allele); differentially expressed genes between wild-type vs. Mettl5 mutant; and interaction effects of Mettl5 and clock genes on sleep. These findings exemplify how a subclass of m6A modifications (i.e., Mettl5-dependent m6A) and possible epi-transcriptomic control of gene expression could impact animal behaviors.

      Strengths:

      Comprehensive DEG analyses between control and Mettl5 mutant flies reveal the landscape of Mettl5-dependent gene regulation at both transcriptome and translatome levels. The molecular/genetic features underlying Mettl5-dependent gene expression may provide important clues to molecular substrates for circadian clocks, sleep, and other physiology relevant to Mettl5 function in Drosophila.

      Weaknesses:

      While these findings indicate the potential implication of Mettl5-dependent gene regulation in circadian clocks and sleep, several key data require substantial improvement and rigor of experimental design and data interpretation for fair conclusions. Weaknesses of this study and possible complications in the original observations include but are not limited to:

      (1) Genetic backgrounds in Mettl5 mutants: the heterozygosity of Mettl5 deletion causes sleep suppression at early night and long-period rhythms in circadian behaviors. The transgenic rescue using Gal4/UAS may support the specificity of the Mettl5 effects on sleep. However, it does not necessarily exclude the possibility that the Mettl5 deletion stocks somehow acquired long-period mutation allelic to other clock genes. Additional genetic/transgenic models of Mettl5 (e.g., homozygous or trans-heterozygous mutants of independent Mettl5 alleles; Mettl5 RNAi etc.) can address the background issue and determine 1) whether sleep suppression tightly correlates with long-period rhythms in Mettl5 mutants; and 2) whether Mettl5 effects are actually mapped to circadian pacemaker neurons (e.g., PDF- or tim-positive neurons) to affect circadian behaviors, clock gene expression, and synaptic plasticity in a cell-autonomous manner and thereby regulate sleep. Unfortunately, most experiments in the current study rely on a single genetic model (i.e., Mettl5 heterozygous mutant).

      (2) Gene expression and synaptic plasticity: gene expression profiles and the synaptic plasticity should be assessed by multiple time-point analyses since 1) they display high-amplitude oscillations over the 24-h window and 2) any phase-delaying mutation (e.g., Mettl5 deletion) could significantly affect their circadian changes. The current study performed a single time-point assessment of circadian clock/synaptic gene expression, misleading the conclusion for Mettl5 effects. Considering long-period rhythms in Mettl5 mutant clocks, transcriptome/translatome profiles in Mettl5 cannot distinguish between direct vs. indirect targets of Mettl5 (i.e., gene regulation by the loss of Mettl5-dependent m6A vs. by the delayed circadian phase in Mettl5 mutants). 

      (3) The text description for gene expression profiling and Mettl5-dependent gene regulation was very detailed, yet there is a huge gap between gene expression profiling and sleep/behavioral analyses. The model in Figure 5 should be better addressed and validated.

    4. Reviewer #3 (Public review):

      Xiaoyu Wu and colleagues examined the potential role in sleep of a Drosophila ribosomal RNA methyltransferase, mettl5. Based on sleep defects reported in CRISPR generated mutants, the authors performed both RNA-seq and Ribo-seq analyses of head tissue from mutants and compared to control animals collected at the same time point. While these data were subjected to a thorough analysis, it was difficult to understand the relative direction of differential expression between the two genotypes. In any case, a major conclusion was that the mutant showed altered expression of circadian clock genes, and that the altered expression of the period gene in particular accounted for the sleep defect reported in the mettl5 mutant. As noted above, a strength of this work is its relevance to a human developmental disorder as well as the transcriptomic and ribosomal profiling of the mutant. However, there are numerous weaknesses in the manuscript, most of which stem from misinterpretation of the findings, some methodological approaches, and also a lack of method detail provided. The authors seemed to have missed a major phenotype associated with the mettl5 mutant, which is that it caused a significant increase in period length, which was apparent even in a light: dark cycle. Thus the effect of the mutant on clock gene expression more likely contributed to this phenotype than any associated with changes in sleep behavior.

    1. eLife Assessment

      This manuscript describes a method using EM polyclonal epitope mapping to help elucidate endogenous antibodies. Overall the work described is interesting and the contribution will be of use to the field that is expected to only increase in impact and value over time. The significance of the work is considered valuable and the strength of evidence to support its findings is considered solid.

    2. Reviewer #1 (Public review):

      Summary:

      The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. It uses the croEM analysis of polyclonal Fabs to antibody genes, with the ultimate aim of getting complete and accurate antibody sequences. The method, commonly termed EMPEM, is becoming increasingly used to understand responses in convalescent sera and optimisation of the workflows and provision of openly available tools is of genuine value to a growing number of people.

      The authors do not address the experimental aspects of the methods and do not present novel computational tools, rather they use a series of established computational methods to provide workflows that simplify the interpretation of the EM map in terms of the sequences of dominant antibodies.

      Strengths:

      The paper is well-written and clearly argued. The tests constructed seem appropriate and fair and demonstrate that the workflow works pretty well. For a small subset (~17%) of the EMPEM maps analysed the workflow was able to get convincing assignments of the V-genes.

      Weaknesses:

      The AI methods used are not a substitute for high quality data and at present very few of the results obtained from EMPEM will be of sufficient quality to robustly assign the sequence of the antibody. However, rather more are likely to be good enough, especially in combination with MS data, to provide a pretty good indication of the V-gene family.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors seek to demonstrate that it is possible to sequence antibody variable domains from cryoEM reconstructions in combination with bottom-up LC-MSMS. In particular, they extract de novo sequences from single particle-cryo-EM-derived maps of antibodies using the "deep-learning tool ModelAngelo", which are run through the program Stitch to try to select the top scoring V-gene and construct a placeholder sequence for the CDR3 of both the heavy and light chain of the antibody under investigation. These reconstructed variable domains are then used as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

      Using this approach the authors claim to have demonstrated that "cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences".

      WhiIe the approach is clearly a work in progress, the manuscript should made easier to understand for the general reader. Indeed, I had a hard time understanding the workflow until I got to Fig. 3. So re-ordering the figures, for example, may be helpful in this regard.

      It would be useful to provide additional concrete examples where the described workflow would assist in the elucidation of CDR3's, in cases where this isn't already known. (In the benchmark dataset from the Electron Microscopy Data Bank, all the antibodies and Fabs are presumably known, as is the case for the monoclonal antibody CR3022). I am having difficulty envisioning how one would prepare samples from actual plasma samples that would be appropriate for single particle cryo-EM and MS data on dominant antibodies of interest. In my experience, most of these samples tend to be quite complex mixtures. So additional discussion of this point would be helpful.

    1. eLife Assessment

      This fundamental work presents two clinically relevant BMP4 mutations that contribute to vertebrate development. The convincing evidence supports that the site-specific cleavage at the BMP4 pro-domain precisely regulates its function and provides mechanistic insight into how homodimers and heterodimers behave differently. The work will be of broad interest to researchers working on growth factor signaling mechanisms and vertebrate development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate that two human preproprotein human mutations in the BMP4 gene cause a defect in proprotein cleavage and BMP4 mature ligand formation, leading to hypomorphic phenotypes in mouse knock-in alleles and in Xenopus embryo assays.

      Strengths:

      They provide compelling biochemical and in vivo analyses supporting their conclusions, showing the reduced processing of the proprotein and concomitant reduced mature BMP4 ligand protein from impressively mouse embryonic lysates. They perform excellent analysis of the embryo and post-natal phenotypes demonstrating the hypomorphic nature of these alleles. Interesting phenotypic differences between the S91C and E93G mutants are shown with excellent hypotheses for the differences. Their results support that BMP4 heterodimers act predominantly throughout embryogenesis whereas BMP4 homodimers play essential roles at later developmental stages.

      Weaknesses:

      A control of BMP7 alone in the Xenopus assays seems important to exclude BMP7 homodimer activity in these assays.

      The Discussion could be strengthened by more in-depth explanations of how BMP4 homodimer versus heterodimer signaling is supported by the results, so that readers do not have to think it all through themselves. Similarly, a discussion of why the S91C mutant has a stronger phenotype than E93G early in the Discussion would be helpful or least mention that it will be addressed later.

    3. Reviewer #2 (Public review):

      Summary:

      Kim et al. report that two disease mutations in proBMP4, Ser91Cys and Glu93Gly, which disrupt the Ser91 FAM20C phosphorylation site, block the activation of proBMP4 homodimers. Consequently, analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced pSmad1 and tbxt1 expression. The block in BMP4 activity caused by the mutations could be overcome by co-expression of BMP7, suggesting that the missense mutations selectively affect the activity of BMP4 homodimers but not BMP4/7 heterodimers. The expert amphibian tissue transplant studies were extended to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, demonstrating the impact of these mutations on embryonic development, particularly in female mice, in line with patient studies. Finally, studies in MEFs revealed that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI (AlphaFold) modeling of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292.

      Strengths:

      The Xenopus and mouse studies are valuable and elegantly describe the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development.

      Weaknesses:

      The interpretation of how the mutations may disturb the furin-mediated cleavage of proBMP4 is underdeveloped and does not consider all of their data. Understanding how pS91 influences the furin-dependent cleavage at Arg292 seems to be the crux of this work and thus warrants more consideration. Specifically:

      (1) Figure S1 may be significantly more informative than implied. The authors report that BMP4S91D activates pSmad1 only incrementally better than S91C and much less than WT BMP4. However, Fig. S1B does not support the conclusion on page 7 (numbering beginning with title page); "these findings suggest that phosphorylation of S91 is required to generate fully active BMP4 homodimers". The authors rightly note that the S91C change likely has manifold effects beyond inhibiting furin cleavage. The E93G change may also affect proBMP4 beyond disturbing FAM20C phosphorylation. Additional mutation analyses would strengthen the work.

      (2) These findings in Figure S1 are potentially significant because they may inform how proBMP4 is protected from cleavage during transit through the TGN and entry into peripheral cellular compartments. Intriguing modeling studies in Figure 6 suggest that pSer91 is proximal to the furin cleavage site. Based on their presentation, pSer91 may contact Arg289, the critical P4 residue at the furin site. If so, might that suggest how pS91 may prevent furin cleavage, thus explaining why the S91D mutation inhibits processing as presented, and possibly how proBMP4 processing is delayed until transit to distal compartments (perhaps activated by a change in the endosomal microenvironment or a Ser91 phosphatase)? Have the authors considered or ruled out these possibilities? In addition to additional mutation analyses of the FAM20C site, moving the discussion of this model to an "Ideas and Speculation" subsection may be warranted.

      (3) The lack of an in vitro protease assay to test the effect of the S91 mutations on furin cleavage is problematic.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe important new biochemical elements in the synthesis of a class of critical developmental signaling molecules, BMP4. They also present a highly detailed description of developmental anomalies in mice bearing known human mutations at these specific elements.

      Strengths:

      Exceptionally detailed descriptions of pathologies occurring in mutant mice. Novel findings regarding the interaction of propeptide phosphorylation and convertase cleavage, both of which will move the field forward. Provocative hypothesis regarding furin access to cleavage sites, supported by Alphafold predictions.

      Weaknesses:

      Figure 6A presents two testable models for pre-release access of furin to cleavage sites since physical separation of enzyme from substrate only occurs in one model; could immunocytochemistry resolve?

    1. eLife Assessment

      This useful study introduces the peptidisc-TPP approach as a promising solution to challenges in membrane proteomics, enabling thermal proteome profiling in a detergent-free system. While the concept is innovative and holds significant potential, the demonstration of its utility and validation remains incomplete. The method presents a strong foundation for broader applications in identifying physiologically and pharmacologically relevant membrane protein-ligand interactions.

    2. Reviewer #1 (Public review):

      Summary:

      The idea is appealing, but the authors have not sufficiently demonstrated the utility of this approach.

      Strengths:

      Novelty of the approach, potential implications for discovering novel interactions

      Weaknesses:

      The Duong had introduced their highly elegant peptidisc approach several years ago. In this present work, they combine it with thermal proteome profiling (TPP) and attempt to demonstrate the utility of this combination for identifying novel membrane protein-ligand interactions.<br /> While I find this idea intriguing, and the approach potentially useful, I do not feel that the authors had sufficiently demonstrated the utility of this approach.<br /> My main concern is that no novel interactions are identified and validated. For the presentation of any new methodology, I think this is quite necessary.<br /> In addition, except for MsbA, no orthogonal methods are used to support the conclusions, and the authors rely entirely of quantifying rather small differences in abundances using either iBAQ or LFQ.<br /> Furthermore, the reported changes in abundances are solely based on iBAQ or LFQ analysis. This must be supported by a more quantitative approach such as SILAC or labeled peptides<br /> In summary, I think this story requires a stronger and broader demonstration of the ability of peptidisc-TPP to identify novel physiologically/pharmacologically relevant interactions.

    3. Reviewer #2 (Public review):

      Summary:

      The membrane mimetic thermal proteome profiling (MM-TPP) presented by Jandu et al. seems to be a useful way to minimize the interference of detergents in efficient mass spectrometry analysis of membrane proteins. Thermal proteome profiling is a mass spectrometric method that measures binding of a drug to different proteins in a cell lysate by monitoring thermal stabilization of the proteins because of the interaction with the ligands that are being studied. This method has been underexplored for membrane proteome because of the inefficient mass spectrometric detection of membrane proteins and because of the interference from detergents that are used often for membrane protein solubilization.

      Strengths:

      In this report the binding of ligands to membrane protein targets has been monitored in crude membrane lysates or tissue homogenates exalting the efficacy of the method to detect both intended and off-target binding events in a complex physiologically relevant sample setting.

      The manuscript is lucidly written and the data presented seems clear. The only insignificant grammatical error I found was that the 'P' in the word peptidisc is not capitalized in the beginning of the methods section "MM-TPP profiling on membrane proteomes". The clear writing made it easy to understand and evaluate what has been presented. Kudos to the authors.

      Weaknesses:

      While this is a solid report and a promising tool for analyzing membrane protein drug interactions, addressing some of the minor caveats listed below could make it much more impactful.

      The authors claim that MM-TPP is done by "completely circumventing structural perturbations invoked by detergents". This may not be entirely accurate, because before reconstitution of the membrane proteins in peptidisc, the membrane fractions are solubilized by 1% DDM. The solubilization and following centrifugation step lasts at least for 45 min. It is less likely that all the structural perturbations caused by DDM to various membrane proteins and their transient interactions become completely reversed or rescued by peptidisc reconstitution. In the introduction, the authors make statements such as "..it is widely acknowledged that even mild detergents can disrupt protein structures and activities, leading to challenges in accurately identifying drug targets.." and "[peptidisc] libraries are instrumental in capturing and stabilizing IMPs in their functional states while preserving their interactomes and lipid allosteric modulators...'. These need to be rephrased, as it has been shown by countless studies that even with membrane protein suspended in micelles robust ligand binding assays and binding kinetics have been performed leading to physiologically relevant conclusions and identification of protein-protein and protein-ligand interactions.

      If the method involves detergent solubilization, for example using 1% DDM, it is a bit disingenuous to argue that 'interactomes and lipid allosteric modulators' characterized by low-affinity interactions will remain intact or can be rescued upon detergent removal. Authors should discuss this or at least highlight the primary caveat of the peptidisc method of membrane protein reconstitution - which is that it begins with detergent solubilization of the proteome and does not completely circumvent structural perturbations invoked by detergents.

      It would also be important to test detergents that are even milder than 1% DDM and ones which are harsher than 1% DDM to show that this method of reconstitution can indeed rescue the perturbations to the structure and interactions of the membrane protein done by detergents during solubilization step. Based on the methods provided, it appears that the final amount of detergent in peptidisc membrane protein library was 0.008%, which is ~150 uM. The CMC of DDM depending on the amount of NaCl could be between 120-170 uM. Perhaps, to completely circumvent the perturbations from detergents other methods of detergent-free solubilization such as using SMA polymers and SMALP reconstitution could be explored for a comparison. Moreover, a comparison of the peptidisc reconstitution with detergent-free extraction strategies, such as SMA copolymers, could lend more strength to the presented method.

      Cross-verification of the identified interactions, and subsequent stabilization or destabilizations, should be demonstrated by other in vitro methods of thermal stability and ligand binding analysis using purified protein to support the efficacy of the MM-TPP method. An example cross-verification using SDS-PAGE, of the well-studied MsbA, is shown in Figure 2. In a similar fashion, other discussed targets such as, BCS1L, P2RX4, DgkA, Mao-B, and some un-annotated IMPs shown in supplementary figure 3 that display substantial stabilization or destabilization should be cross-verified.

    1. eLife Assessment

      This valuable study addresses a gap in our understanding of how the size of the attentional field is represented within the visual cortex. The evidence supporting the role of visual cortical activity is solid, based on a novel modeling analysis of fMRI data. The results will be of interest to psychologists and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      The authors conducted an fMRI study to investigate the neural effects of sustaining attention to areas of different sizes. Participants were instructed to attend to alphanumeric characters arranged in a circular array. The size of attention field was manipulated in four levels, ranging from small (18 deg) to large (162 deg). They used a model-based method to visualize attentional modulation in early visual cortex V1 to V3, and found spatially congruent modulations of the BOLD response, i.e., as the attended area increased in size, the neural modulation also increased in size in the visual cortex. They suggest that this result is a neural manifestation of the zoom-lens model of attention and that the model-based method can effectively reconstruct the neural modulation in the cortical space.

      The study is well-designed with sophisticated and comprehensive data analysis. The results are robust and show strong support for a well-known model of spatial attention, the zoom-lens model. Overall, I find the results interesting and useful for the field of visual attention research. I have questions about some aspects of the results and analysis as well as the bigger picture.

      (1) It appears that the modulation in V1 is weaker than V2 and V3 (Fig 2). In particular, the width modulation in V1 is not statistically significant (Fig 5). This result seems a bit unexpected. Given the known RF properties of neurons in these areas, in particular, smaller RF in V1, one might expect more spatially sensitive modulation in V1 than V2/V3. Some explanations and discussions would be helpful. Relatedly, one would also naturally wonder if this method can be applied to other extrastriate visual areas such as V4 and what the results look like.

      (2) I'm a bit confused about the angular error result. Fig 4 shows that the mean angular error is close to zero, but Fig 5 reports these values to be about 30-40 deg. Why the big discrepancy? Is it due to the latter reporting absolute errors? It seems reporting the overall bias is more useful than absolute value.

      (3) A significant effect is reported for amplitude in V3 (line 78), but the graph in Fig 5 shows hardly any difference. Please confirm the finding and also explain the directionality of the effect if there is indeed one.

      (4) The purpose of the temporal interval analysis is rather unclear. I assume it has to do with how much data is needed to recover the cortical modulation and hence how dynamic a signal the method can capture. While the results make sense (i.e., more data is better), there is no obvious conclusion and/or interpretation of its meaning.

      (5) I think it would be useful for the authors to make a more explicit connection to previous studies in this literature. In particular, two studies seem particularly relevant. First, how do the present results relate to those in Muller et al (2003, reference 37), which also found a zoom-lens type of neural effects. Second, how does the present method compare with spatial encoding model in Sprague & Serences (2013, reference 56), which also reconstructs the neural modulation of spatial attention. More discussions of these studies will help put the current study in the larger context.

      (6) Fig 4b, referenced on line 123, does not exist.

    3. Reviewer #2 (Public review):

      Summary:

      The study in question utilizes functional magnetic resonance imaging (fMRI) to dynamically estimate the locus and extent of covert spatial attention from visuocortical activity. The authors aim to address an important gap in our understanding of how the size of the attentional field is represented within the visual cortex. They present a novel paradigm that allows for the estimation of the spatial tuning of the attentional field and demonstrate the ability to reliably recover both the location and width of the attentional field based on BOLD responses.

      Strengths:

      (1) Innovative Paradigm: The development of a new approach to estimate the spatial tuning of the attentional field is a significant strength of this study. It provides a fresh perspective on how spatial attention modulates visual perception.<br /> (2) Refined fMRI Analysis: The use of fMRI to track the spatial tuning of the attentional field across different visual regions is methodologically rigorous and provides valuable insights into the neural mechanisms underlying attentional modulation.<br /> (3) Clear Presentation: The manuscript is well-organized, and the results are presented clearly, which aids in the reader's comprehension of the complex data and analyses involved.

      Weaknesses:

      (1) Lack of Neutral Cue Condition: The study does not include a neutral cue condition where the cue width spans 360{degree sign}, which could serve as a valuable baseline for assessing the BOLD response enhancements and diminishments in both attended and non-attended areas.<br /> (2) Clarity on Task Difficulty Ratios: The explicit reasoning for the chosen letter-to-number ratios for various cue widths is not detailed. Ensuring clarity on these ratios is crucial, as it affects the task difficulty and the comparability of behavioral performance across different cue widths. It is essential that observed differences in behavior and BOLD signals are attributable solely to changes in cue width and not confounded by variations in task difficulty.

    4. Reviewer #3 (Public review):

      Summary:

      In this report, the authors tested how manipulating the contiguous set of stimuli on the screen that should be used to guide behavior - that is, the scope of visual spatial attention - impacts the magnitude and profile of well-established attentional enhancements in visual retinotopic cortex. During fMRI scanning, participants attended to a cued section of the screen for blocks of trials and performed a letter vs digit discrimination task at each attended location (and judged whether the majority of characters were letters/digits). Importantly, the visual stimulus was identical across attention conditions, so any observed response modulations are due to top-down task demands rather than visual input. The authors employ population receptive field (pRF) models, which are used to sort voxel activation with respect to the location and scope of spatial attention and fit a Gaussian-like function to the profile of attentional enhancement from each region and condition. The authors find that attending to a broader region of space expands the profile of attentional enhancement across the cortex (with a larger effect in higher visual areas), but does not strongly impact the magnitude of this enhancement, such that each attended stimulus is enhanced to a similar degree. Interestingly, these modulations, overall, mimic changes in response properties caused by changes to the stimulus itself (increase in contrast matching the attended location in the primary experiment). The finding that attentional enhancement primarily broadens, but does not substantially weaken in most regions, is an important addition to our understanding of the impact of distributed attention on neural responses, and will provide meaningful constraints to neural models of attentional enhancement.

      Strengths:

      - Well-designed manipulations (changing location and scope of spatial attention), and careful retinotopic/pRF mapping, allow for a robust assay of the spatial profile of attentional enhancement, which has not been carefully measured in previous studies<br /> - Results are overall clear, especially concerning width of the spatial region of attentional enhancement, and lack of clear and consistent evidence for reduction in the amplitude of enhancement profile<br /> - Model-fitting to characterize spatial scope of enhancement improves interpretability of findings

      Weaknesses:

      - Task difficulty seems to vary as a function of spatial scope of attention, with varying ratios of letters/digits across spatial scope conditions, which may complicate interpretations of neural modulation results<br /> - Some aspects of analysis/data sorting are unclear (e.g., how are voxels selected for analyses?)<br /> - While the focus of this report is on modulations of visual cortex responses due to attention, the lack of inclusion of results from other retinotopic areas (e.g. V3AB, hV4, IPS regions like IPS0/1) is a weakness<br /> - Additional analyses comparing model fits across amounts of data analyzed suggest the model fitting procedure is biased, with some parameters (e.g., FWHM, error, gain) scaling with noise.

    5. Author response:

      We thank the three reviewers for their insightful feedback. We look forward to addressing the raised concerns in a revised version of the manuscript. There were a few common themes among the reviews that we will briefly touch upon now, and we will provide more details in the revised manuscript. 

      First, the reviewers asked for the reasoning behind the task ratios we implemented for the different attentional width conditions. The different ratios were selected to be as similar as possible given the size and spacing of our stimuli (aside from the narrowest cue width of one bin, the ratios for the others were 0.67, 0.60, and 0.67). As Figure 1b shows, task accuracy showed small and non-monotonic changes across the three larger cue widths, dissociable from the monotonic pattern seen for the model-estimated width of the attentional field. Furthermore, prior work has indicated that there is a relationship between task difficulty and the overall magnitude of the BOLD response, however we don’t suspect that this will influence the width of the modulation. How task difficulty influences the BOLD response is an important topic, and we hope that future work will investigate this relationship more directly.   

      Second, reviewers expressed interest in the distribution of spatial attention in higher visual areas. In our study we focus only on early visual regions (V1-V3). This was primarily driven by pragmatic considerations, in that we only have retinotopic estimates for our participants in these early visual areas. Our modeling approach is dependent on having access to the population receptive field estimates for all voxels, and while the main experiment was scanned using whole brain coverage, retinotopy was measured in a separate session using a field of view only covering the occipital cortex.  

      Lastly, we appreciate the opportunity to clarify the purpose of the temporal interval analysis. The reviewer is correct in assuming we set out to test how much data is needed to recover the cortical modulation and how dynamic a signal the method can capture. This analysis does show that more data provided more reliable estimates. The more important finding, however, is that the model was still able to recover the location and width of the attentional cue at shorter timescales of as few as two TRs. This has implications for the potential applicability of our approach to paradigms that involve more dynamic adaptation of the attentional field.

    1. eLife Assessment

      Understanding bacterial growth mechanisms can potentially help uncover novel drug targets that are crucial for maintaining cellular viability, particularly for bacterial pathogens. In this important study, the authors investigate the role of mycobacterial Wag31 in lipid and peptidoglycan biosynthesis. A detailed analysis of Wag31 domain architecture revealed a role in membrane tethering, more specifically, the N-terminal and C-terminal domains appear to display distinct functional roles therein. Whilst the data presented are of use, the experimental evidence is currently incomplete and does not yet fully support the conclusions made.

    2. Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.<br /> (2) The pulldown assays results are interesting, but links are tentative.<br /> (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

    3. Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence.<br /> (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions, however, the pull-down experiments are lacking a valid negative control.<br /> (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements, or provide additional data to support their claims.

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comments. We will improve Ln130 in the manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response to lipid homeostasis could be attributed to a change in the stoichiometry of these interactions with Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions, ultimately modulating lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off ≥18 and unique peptides≥5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, we performed pulldown experiments for validation by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100, eliminating all non-specific and indirect interactions.  However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript. 

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes.  Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence.

      (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pull-down experiments lack a valid negative control.

      We thank the reviewer for the comments. We will include a valid negative control in the experiment. We would choose ~2 mycobacterial proteins that are not a part of our interactome study and perform a similar pull-down experiment with them and a positive control (known interactor of Wag31).

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      Previously, we attempted to express the N-terminal (1-60 aa) and the C-terminal (60-212 aa) proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither was expressed with the N/C-terminal FLAG tag nor without any tag in episomal or integrative vectors due to the instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and C-terminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni affinity pull-down experiments for mass spectrometry.  The N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we plan to attempt to express N-terminal of Wag31with N and C-terminal hexa-His tag in E. coli. If this clone successfully expresses in E. coli, we will perform pull-down experiments as described in Figure 7.

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comments. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011).  This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M<sup>−1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M<sup>−1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our protein-lipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to accurately reflect the data. Results shown suggest that the tethering activity of the N-terminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process. 

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localization (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023), and two groups have shown slightly sub-polar localization of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) they showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it likely associates with the polar MurG. However, since a sub-polar localization of MurG has also been reported, it is possible that they do not interact directly, and another protein mediates their interaction. We will modify the model proposed in Fig. 8 based on the above.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer containing 1% Triton X100, which eliminates all non-specific and indirect interactions.  However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model reflecting our results.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E3243-3251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267-273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

    1. eLife Assessment

      This valuable study combines massively parallel reporter assays and regression analysis to identify sequence features in untranslated regions contributing to the stability of in vitro transcribed mRNA delivered to cells. The strength of evidence presented is solid, although some points about half-life measurements and the relevance of identified sequence features to native transcript stability will inform future discussion surrounding the present study. Taken together, the work will be of interest to a broad swath of colleagues studying post-transcriptional gene regulation and especially to those using massively parallel reporter assays.

    2. Reviewer #1 (Public review):

      In the manuscript by Su et al., the authors present a massively parallel reporter assay (MPRA) measuring the stability of in vitro transcribed mRNAs carrying wild-type or mutant 5' or 3' UTRs transfected into two different human cell lines. The goal presented at the beginning of the manuscript was to screen for effects of disease-associated point mutations on the stability of the reporter RNAs carrying partial human 5' or 3' UTRs. However, the majority of the manuscript is dedicated to identifying sequence components underlying the differential stability of reporter constructs. This analysis showed that UA dinucleotides are the most predictive feature of RNA stability in both cell lines and both UTRs.

      The effect of AU rich elements (AREs) on RNA stability is well established in multiple systems, and the present study confirms this general trend, but points out variability in the consequence of seemingly similar motifs on RNA stability. For example, the authors report that a long stretch of Us has extreme opposite effects on RNA stability depending on whether it is preceded by an A (strongly destabilizing) or followed by an A (strongly stabilizing). While the authors interpretation of a context-dependence of the effect is certainly well-founded, it seems counterintuitive that the preceding or following A would be the (only) determining factor. This points to a generally reductionist approach taken by the authors in the analysis of the data and in their attempt to dissect the contribution of "AU rich sequences" to RNA stability, with a general tendency to reduce the size and complexity of the features (e.g. to dinucleotides). While this certainly increases the statistical power of the analysis due to the number of occurrences of these motifs, it limits the interpretability of the results. How do UA dinucleotides per se contribute to destabilizing the RNA, both in 5' and 3' UTRs, but (according to limited data presented) not in coding sequences? What is the mechanism? RBPs binding to UA dinucleotide containing sequences are suggested to "mask" the destabilizing effect, thereby leading to a more stable RNA. Gain of UA dinucleotides is reported to have a destabilizing effect, but again no hypothesis is provided as to the underlying molecular mechanism. In addition to reducing the motif length to dinucleotides, the notion of "context dependence" is used in a very narrow sense.

      The present MPRA measures the effect of UTR sequences in one specific reporter context and using one experimental approach (following the decay of in vitro transcribed and transfected RNAs). While this method certainly has its merits compared to other approaches, it also comes with some caveats: RNA is delivered naked, without bound RBPs and no nuclear history, e.g. of splicing (no EJCs), editing and modifications. Therefore, it remains to be seen whether UA dinucleotide frequency is a substantial factor in determining the half-lives of endogenous mRNAs.

      The authors conclude their study with a meta-analysis of genes with increased UA dinucleotides in 5' and 3'UTRs, showing that specific functional groups are overrepresented among these genes. In addition, they provide evidence for an effect of disease-associated UTR mutations on endogenous RNA stability. While these elements link back to the original motivation of the study (screening for effects of point mutations in 5' and 3' UTRs), they provide only a limited amount of additional insights.

      In summary, this manuscript presents an interesting addition to the long-standing attempts at dissecting the sequence basis of RNA stability in human cells. The analysis is in general comprehensive and sound; however, it remains unclear to what extent the findings can be generalized beyond the method and the experimental system used here.

      Comments on revisions:

      Parts of my original comments have been adequately addressed by the reviewers.<br /> After reading the revised manuscript and the rebuttal, my main concern is related to the figure comparing the half-lives as measured in the two different cell lines that was included in the response to reviewer 2, but not in the revised manuscript. The complete lack of correlation between the half-lives of the 3'UTR library measured in the two cell lines is concerning. While variability and cell type-specific effects can be expected, some principles should be the same (such as the effect of UA dinucleotides that the authors report), leading to at least some correlation.<br /> In addition, it is unclear to me why the half-lives measured for the two libraries in HEK cells are shifted (median ln(t 1/2)=6-7 for the 5'UTR library and ln(t 1/2)=4-4.5 for the 3'UTR library), but not in SH.

      I feel that this figure contains important information that should be included in the final manuscript.

    3. Reviewer #2 (Public review):

      Summary of goals:

      Untranslated regions are key cis-regulatory elements that control mRNA stability, translation, and translocation. Through interactions with small RNAs and RNA binding proteins, UTRs form complex transcriptional circuitry that allows cells to fine-tune gene expression. Functional annotation of UTR variants has been very limited, and improvements could offer insights into disease relevant regulatory mechanisms. The goals were to advance our understanding of the determinants of UTR regulatory elements and characterize the effects of a set of "disease-relevant" UTR variants.

      Strengths:

      The use of a massively parallel reporter assay allowed for analysis of a substantial set (6,555 pairs) of 5' and 3' UTR fragments compiled from known disease associated variants. Two cell types were used.

      The findings confirm previous work about the importance of AREs, which helps show validity and adds some detailed comparisons of specific AU-rich motif effects in these two cell types.

      Using a Lasso regression, TA-dinucleotide content is identified as a strong regulator of RNA stability in a context dependent manner based on GC content and presence of RNA binding protein binding motifs. The findings have potential importance, drawing attention to a UTR feature that is not well characterized.

      The use of complementary datasets, including from half-life analyses of RNAs and from random sequence library MRPA's, is a useful addition and supports several important findings. The finding the TA dinucleotides have explanatory power separate from (and in some cases interacting with) GC content is valuable.

      The functional enrichment analysis suggests some new ideas about how UTRs may contribute to regulation of certain classes of genes.

      Weaknesses:

      In this section, original reviewer comments about the initial submission and the responses of the authors are listed together with new reviewer responses to the authors:

      Reviewer original comment 1: It is difficult to understand how the calculations for half-life were performed. The sequencing approach measures the relative frequency of each sequence at each time point (less stable sequences become relatively less frequent after time 0, whereas more stable sequences become relatively more frequent after time 0). Since there is no discussion of whether the abundance of the transfected RNA population is referenced to some external standard (e.g., housekeeping RNAs), it is not clear how absolute (rather than relative) half-lives were determined.

      Author response: [The authors showed the equations used to calculate half lives based on read counts.] They stated that "The absolute abundance was not required for the half-life calculation."

      Reviewer response to authors: The methods section states that DESeq2 was used to normalize read counts. DESeq2 normalization assumes that levels of most RNAs are not different between samples. That assumption is not valid here, since RNAs in the library are introduced into cells at time 0 and all RNAs decrease over time. If DESeq2 is applied without modification to normalize across timepoints, normalized reads from less stable RNAs will decrease over time (as expected) but normalized reads from more stable RNAs will increase. Can the authors please clarify in the methods how the read counts were normalized to account for this issue?

      Reviewer original comment 2: Fig. S1A and B are used to assess reproducibility. They show that read counts at a given time point correlate well across replicate experiments. However, this is not a good way to assess reproducibility or accuracy of the measurements of t1/2 are. (The major source of variability in read counts in these plots - especially at early time points - is likely starting abundance of each RNA sequence, not stability.) This creates concerns about how well the method is measuring t1/2. Also creating concern is the observation that many RNAs are associated with half-lives that are much longer than the time points analyzed in the study. For example, based upon Figure S1 and Table S1 correctly, the median t1/2 for the 5' UTR library in HEK cells appears to be >700 minutes. Given that RNA was collected at 30, 75, and 120 minutes, accurate measurements of RNAs with such long half lives would seem to be very difficult.

      Author response: ... The calculation of the half-life involves first determining the decay constant 𝜆, which represents a constant rate of decay. Since 𝜆 is a constant, it is possible to accurately calculate it without needing data over the entire decay range. Our experimental design considers this by selecting appropriate time points to ensure a reliable estimation of 𝜆, and thus, the half-life. To determine the most suitable time points, we conducted preliminary experiments using RT-PCR. These experiments indicated that 30, 75, and 120 minutes provided an effective range for capturing the decay dynamics of the transcripts.

      Reviewer response to author comments: Based on Fig. S1D, for 3' UTRs in both cell types and for 5' UTRs in SH-SY5Y cells, median t1/2 is in the range of ~30 to 90 minutes (corresponding to ln t1/2 = 3.5 to 4.5). Measuring RNAs at 30, 75, and 120 minutes would therefore be a good choice for these cases, However, median t1/2 in HEK cells appears to be ~600 minutes (corresponding to ln t1/2 ~6.4) for HEK cells. For t1/2 of 600 minutes, RNA levels at the final time point (120 minutes) would be 90% of the those at the first time point (30 minutes), which illustrates why the method would need to be able to reliably capture very small changes in RNA abundance to accurately measure t1/2 for transcripts with half-lives much longer than 120 minutes. As suggested in our original review, this concern could be addressed by showing the correlation of half-lives across replicates for the 5' and 3' UTR libraries in both cell types. Alternatively, the authors could show other measures of reproducibility for the half-life measurements across replicates. This requires no additional experimentation and can be done using the data from replicate runs shown in Fig. S1A and B. We remain concerned that for sequences with very long half-lives, extrapolating the half-life from small changes between 30 and 120 minutes will lead to imprecise measurements.

      Reviewer original comment 3: There is no direct comparison of t1/2 between the two cell types studied for the full set of sequences studied. This would be helpful in understanding whether the regulatory effects of UTRs are generally similar across cell lines (as has been shown in some previous studies) or whether there are fundamental differences. The distribution of t1/2's is clearly quite different in the two cell lines, but it is important to know if this reflects generally slow RNA turnover in HEK cells or whether there are a large number of sequence-specific effects on stability between cell lines. A related issue is that it is not clear whether the relatively small number of significant variant effects detected in HEK cells versus SH-SY5Y cells is attributable to real biological differences between cell types or to technical issues (many fewer read counts and much longer half lives in HEK cells).

      Author response: For both cell lines, we selected oligonucleotides with R2 > 0.5 and mean squared error (MSE) < 1 for analysis when estimating half-life (λ) by linear regression. This selection criterion was implemented to minimize the effect of experimental noise. After quality control, we selected common UTRs and compared the RNA half-lives of the two cell lines using a scatter plot. The figure below shows that RNA half-lives are quite different between the cell lines, with a moderate similarity observed in the 5' UTRs (R = 0.21), while the correlation in the 3' UTRs is non-significant. Despite the low correlation of mRNA half-life between the two cell lines, UA-dinucleotide and UA-rich sequences consistently emerge as the most significant destabilizing features, suggesting a shared regulatory mechanism across diverse cellular environments.

      Reviewer response to author comments: We appreciate that the authors shared this additional analysis of the data. We believe that this is an important finding and that the additional figure showing correlations of half-lives across cell types should be included in the manuscript or supplement. Discussion of this result in the manuscript would also be useful for readers. This result is surprising to us since we would have expected that widely expressed RNA-binding proteins would have led to more similar effects between the two cell types, as previously found using other approaches (e.g., studies of 3' UTR effects in MPRAs). It would also be appropriate to discuss that differences seen between the two cell types indicate that caution is warranted when trying to generalize the results of this study to other cell types.

      Reviewer original comment 4 has been addressed adequately in the revised manuscript.

      Appraisal and impact:

      Reviewer original comment 1: The work adds to existing studies that previously identified sequence features, including AREs and other RNA binding protein motifs, that regulate stability and puts a new emphasis on the role of "TA" (better "UA") dinucleotides. It is not clear how potential problems with the RNA stability measurements discussed above might influence the overall conclusions, which may limit the impact unless these can be addressed.

      It is difficult to understand whether the importance of TA dinucleotides is best explained by their occurrence in a related set of longer RBP binding motifs (see Fig 5J, these motifs may be encompassed by the "WWWWWW cluster") or whether some other explanation applies. Further discussion of this would be helpful. Does the LASSO method tend to collapse a more diverse set of longer motifs that are each relatively rare compared to the dinucleotide? It remains unclear whether TA dinucleotides are associated with less stability independent of the presence of the known larger WWWWWWW motif. As noted above, the importance of TA dinucleotides in the HEK experiments appears to be less than is implied in the text.

      Author response: To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. There is no evidence to support a preference for dinucleotides by LASSO. To address whether the destabilizing effect of UA dinucleotides is part of the broader WWWWWW motif, we divided UA dinucleotides into two groups: those within the WWWWWW motif and those outside of it. Specifically, we divided UTRs into two categories: 'at least one UA within a WWWWWW motif' and 'no UA within a WWWWWW motif,' and visualized the results using a boxplot. As shown in [figures provided to the reviewers], the destabilizing trend still remains for UA dinucleotides outside of the WWWWWW motif, although the effect appears to be more pronounced when UA is within the WWWWWW motif. This suggests that while UA dinucleotides have a destabilizing effect independently, their impact is amplified when they are part of the broader WWWWWW motif.

      Reviewer response to authors: These are useful additional analyses, and we suggest that the additional figure and discussion should be included in the manuscript/supplement so that readers can benefit from them.

      Reviewer original comment 2: The inclusion of more than a single cell type is an acknowledgement of the importance of evaluating cell type-specific effects. The work suggests a number of cell type-specific differences, but due to technical issues (especially with the HEK data, as outlined above) and the use of only two cell lines, it is difficult to understand cell type effects from the work.

      The inclusion of both 3' and 5' UTR sequences distinguishes this work from most prior studies in the field. Contrasting the effects of these regions on stability is of interest, although the role of these UTRs (especially the 5' UTR) in translational regulation is not assessed here.

      Author response: We examined the role of UTR and UTR variants in translation regulation using polysome profiling. By both univariate analysis and an elastic regression model, we identified motifs of short repeated sequences, including SRSF2 binding sites, as mutation hotspots that lead to aberrant translation. Furthermore, these polysome-shifting mutations had a considerable impact on RNA secondary structures, particularly in upstream AUG-containing 5' UTRs. Integrating these features, our model achieved high accuracy (AUROC > 0.8) in predicting polysome-shifting mutations in the test dataset. Additionally, metagene analysis indicated that pathogenic variants were enriched at the upstream open reading frame (uORF) translation start site, suggesting changes in uORF usage underlie the translation deficiencies caused by these mutations. Illustrating this, we demonstrated that a pathogenic mutation in the IRF6 5' UTR suppresses translation of the primary open reading frame by creating a uORF. Remarkably, site-directed ADAR editing of the mutant mRNA rescued this translation deficiency. Because the regulation of translation and stability does not converge, we illustrate these two mechanisms in two separate manuscripts (this one and doi.org/10.1101/2024.04.11.589132).

      Reviewer response to authors: This is useful context. No further comment.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript titled "Multiplexed Assays of Human Disease‐relevant Mutations Reveal UTR Dinucleotide Composition as a Major Determinant of RNA Stability" the authors aim to investigate the effect of sequence variations in 3'UTR and 5'UTRs on the stability of mRNAs in two different human cell lines.

      To do so, the authors use a massively parallel reporter assay (MPRA). They transfect cells with a set of mRNA reporters that contain sequence variants in their 3' or 5' UTRs, which were previously reported in human diseases. They follow their clearance from cells over time relative to the matching non-variant sequence. To analyze their results, they define a set of factors (RBP and miRNA binding sites, sequence features, secondary structure etc.) and test their association with differences in mRNA stability. For features with a significant association, they use clustering to select a subset of factors for LASSO regression and identify factors that affect mRNA stability.<br /> They conclude that the TA dinucleotide content of UTRs is the strongest destabilizing sequence feature. Within that context, elevated GC content and protein binding can protect susceptible mRNAs from degradation. They also show that TA dinucleotide content of UTRs affects native mRNA stability and that it is associated with specific functional groups. Finally, they link disease associated sequence variants with differences in mRNA stability of reporters.

      Strengths:

      (1) This work introduces a different MPRA approach to analyze the effect of genetic variants. While previous works in tissue culture use DNA transfections that require normalization for transcription efficiency, here the mRNA is directly introduced into cells at fixed amounts, allowing a more direct view of the mRNA regulation.

      (2) The authors also introduce a unique analysis approach, which takes into account multiple factors that might affect mRNA stability. This approach allows them to identify general sequence features that affect mRNA stability beyond specific genetic variants, and reach important insights on mRNA stability regulation. Indeed, while the conclusions to genetic variants identified in this work are interesting, the main strength of the work involves general effect of sequence features rather than specific variants.

      (3) The authors provide adequate support for their claims and validate their analysis using both their reporter data and native genes. For the main feature identified, TA di-nucleotides, they perform follow-up experiments with modified reporters that further strengthen their claims, and also validate the effect on native cellular transcripts (beyond reporters), demonstrating its validity also within native scenarios.

      (4) The work provides a broad analysis of mRNA stability, across two mRNA regulatory segments (3'UTR and 5'UTR) and is performed in two separate cell-types. Comparison between two different cell-types is adequate, and the results demonstrate, as expected, the dependence of mRNA stability on the cellular context. Analysis of 3'UTR and 5'UTR regulatory effects also shows interesting differences and similarities between these two regulatory regions.

      Weaknesses:

      In their revised manuscripts, the authors successfully address many of the weaknesses raised in the original review, including the effect of possible confounding effects, and additional methodology details. Notably, two of the issues raised in the original report, have only been partially addressed in the revision.

      (1) The analysis and regression models built in this work are not thoroughly investigated relative to native genes within cells.<br /> While using MPRAs indeed allows to isolate regulatory effects that are less influential in-vivo, the resulting effects still provide some regulatory function in-vivo. The goal of such an analysis would not be to demonstrate the predictive power of the models, or to make any claims regarding using these models to fully explain or predict the stability of native transcripts. Clearly, additional more prominent factors could function in controlling endogenous RNA stability.<br /> Instead, the goal of such an investigation is to simply assess the fraction of in-vivo regulation that the factors identified in this work contribute in native contexts, and what is the relative contribution of the phenomena captured by the well-controlled MPRA study.<br /> This reviewer believes that even if the effects identified by the current MPRA study only contribute a small fraction of in-vivo variation, an analysis that aim to estimate what this fraction is, will be very relevant to this study for several reasons. First, in order to appreciate the results of this study within their in-vivo context. Second, in light of the questions raised as motivation for this study, and particularly the need to identify the effect of disease-associated 3'UTR variants, which clearly have an in-vivo effect.

      (2) Methodology validation can be performed with simulated data (generated in-silico by the authors) to provide an independent support for the ability of the current methodology to correctly extract regulatory effects from the data.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript by Su et al., the authors present a massively parallel reporter assay (MPRA) measuring the stability of in vitro transcribed mRNAs carrying wild-type or mutant 5' or 3' UTRs transfected into two different human cell lines. The goal presented at the beginning of the manuscript was to screen for effects of disease-associated point mutations on the stability of the reporter RNAs carrying partial human 5' or 3' UTRs. However, the majority of the manuscript is dedicated to identifying sequence components underlying the differential stability of reporter constructs. This shows that TA dinucleotides are the most predictive feature of RNA stability in both cell lines and both UTRs.

      The effect of AU rich elements (AREs) on RNA stability is well established in multiple systems, and the present study confirms this general trend but points out variability in the consequence of seemingly similar motifs on RNA stability. For example, the authors report that a long stretch of Us has extreme opposite effects on RNA stability depending on whether it is preceded by an A (strongly destabilizing) or followed by an A (strongly stabilizing). While the authors interpretation of a context- dependence of the effect is certainly well-founded, it seems counterintuitive that the preceding or following A would be the (only) determining factor. This points to a generally reductionist approach taken by the authors in the analysis of the data and in their attempt to dissect the contribution of "AU rich sequences" to RNA stability, with a general tendency to reduce the size and complexity of the features (e.g. to dinucleotides). While this certainly increases the statistical power of the analysis due to the number of occurrences of these motifs, it limits the interpretability of the results. How do TA dinucleotides per se contribute to destabilizing the RNA, both in 5' and 3' UTRs, but (according to limited data presented) not in coding sequences? What is the mechanism? RBPs binding to TA dinucleotide containing sequences are suggested to "mask" the destabilizing effect, thereby leading to a more stable RNA. Gain of TA dinucleotides is reported to have a destabilizing effect, but again no hypothesis is provided as to the underlying molecular mechanism. In addition to reducing the motif length to dinucleotides, the notion of "context dependence" is used in a very narrow sense; especially when focusing on simple and short motifs, a more extensive analysis of the interdependence of these features (beyond the existing analysis of the relationship between TA- diNTs and GC content) could potentially reveal more of the context dependence underlying the seemingly opposite behavior of very similar motifs.

      (We have used UA instead of TA, as per the reviewer's suggestion)

      The contribution of coding region sequence to RNA stability has been extensively discussed (For example: doi.org/10.1016/j.molcel.2022.03.032; doi.org/10.1186/s13059-020-02251-5; doi.org/10.15252/embr.201948220; doi.org/10.1371/journal.pone.0228730; doi.org/10.7554/eLife.45396). While UA content at the third codon position (wobble position) has been implicated as a pro-degradation signal, codon optimality has emerged as the most prominent determinant for RNA stability. This indicates that the role of coding regions in RNA stability differs from that of UTRs due to the involvement of translation elongation. We did not intend to suggest that UA-dinucleotides in UTRs and coding regions have the same effect. 

      To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. As a result, while motifs with very low occurrences were excluded from the analysis, there is no evidence to indicate a preference for dinucleotides by the LASSO model.

      We hypothesize that UA-dinucleotide may recruit endonucleases RNase A family, whose catalytic pockets exhibit a strong bias for UA dinucleotide (doi.org/10.1016/j.febslet.2010.04.018). Structures or protein bindings that block this recognition might stabilize RNAs. To gain further insight into the motif interactions, we investigated the interactions between UA and other 15 dinucleotides through more detailed analyses. We conducted a linear regression analysis investigating interactions between UA and the other 15 dinucleotides. The formula used below includes UA:

      , where all 𝛽 terms represent the regression coefficients, and , , and represent the number of UA dinucleotides, the number of other dinucleotides (other than UA), and the GC content of the i<sup>th</sup> UTR, respectively, and 𝜖<sub>i</sub> denotes the error term. For each dinucleotide, we tested the significance of 𝛽<sub>UAxGC%</sub> and 𝛽<sub>UAxDiNT</sub>, and compared their p-values using a quantile-quantile (QQ) plot. Author response image 1 shows that the interaction effect of UA dinucleotides with GC% is much more significant than interactions with the other 15 dinucleotides, as indicated by the inflated QQ plot of p-values. This suggests that GC content is a more critical contextual factor influencing UA dinucleotides' impact on RNA stability.

      Author response image 1.

      The present MPRAs measures the effect of UTR sequences in one specific reporter context and using one experimental approach (following the decay of in vitro transcribed and transfected RNAs). While this approach certainly has its merits compared to other approaches, it also comes with some caveats: RNA is delivered naked, without bound RBPs and no nuclear history, e.g. of splicing (no EJCs), editing and modifications. One way to assess the generalizability of the results as well as the context dependence of the effects is to perform the same analysis on existing datasets of RNA stability measurements obtained through other methods (e.g. transcription inhibition). Are TA dinucleotides universally the most predictive feature of RNA half-lives?

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we did not intend to generalize our conclusions to endogenous RNAs, our approach contributes to the understanding of in vitro synthesized RNA used for cellular expression, such as in vaccines. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, these factors are controlled in our experiments. Therefore, we do not expect the dinucleotide features found by our approach to be generalized as the most predictive feature of RNA half-life in vivo. 

      The authors conclude their study with a meta-analysis of genes with increased TA dinucleotides in 5' and 3'UTRs, showing that specific functional groups are overrepresented among these genes. In addition, they provide evidence for an effect of disease-associated UTR mutations on endogenous RNA stability. While these elements link back to the original motivation of the study (screening for effects of point mutations in 5' and 3' UTRs), they provide only a limited amount of additional insights.

      We utilized the Taiwan Biobank to investigate whether mutations significantly affecting RNA stability also impact human biochemical measurements. Our findings indicate that these mutations indeed have a significant effect on various biochemical indices. This highlights the importance of our study, as it bridges basic science with potential applications in precision medicine. By linking specific UTR mutations with measurable changes in biochemical indices, our research underscores the potential for these findings to inform targeted medical interventions in the future.

      In summary, this manuscript presents an interesting addition to the long-standing attempts at dissecting the sequence basis of RNA stability in human cells. The analysis is in general very comprehensive and sound; however, at times the goal of the authors to find novelty and specificity in the data overshadows some analyses. One example is the case where the authors try to show that TA-dinucleotides and GC content are decoupled and not merely two sides of the same coin.

      They claim that the effect of TA dinucleotides is different between high- and low-GC content contexts but do not control for the fact that low GC-content regions naturally will contain more TA dinucleotides and therefore the effect sizes and the resulting correlation between TA-diNT rate and stability will be stronger (Fig. 5A). A more thorough analysis and greater caution in some of the claims could further improve the credibility of the conclusions.

      Low GC content implies a higher UA content but does not directly equate to a high UA-dinucleotide ratio. For instance, the sequence AUUGAACCUU has a lower GC content (0.3) compared to UAUAGGCCGC (0.6), yet it also has a lower UA-dinucleotide ratio (0 vs. 0.22). To address this concern more rigorously, we performed a stratified analysis based on UA-diNT rate. As shown in our Fig. S7C, even after stratifying by UA- dinucleotide ratio (upper panel high UA- dinucleotide ratio / lower panel low UA- dinucleotide ratio), we still observe that the destabilizing effect of UA is stronger in the low GC content group.

      Reviewer #2 (Public Review):

      Summary of goals:

      Untranslated regions are key cis-regulatory elements that control mRNA stability, translation, and translocation. Through interactions with small RNAs and RNA binding proteins, UTRs form complex transcriptional circuitry that allows cells to fine-tune gene expression. Functional annotation of UTR variants has been very limited, and improvements could offer insights into disease relevant regulatory mechanisms. The goals were to advance our understanding of the determinants of UTR regulatory elements and characterize the effects of a set of "disease-relevant" UTR variants.

      Strengths:

      The use of a massively parallel reporter assay allowed for analysis of a substantial set (6,555 pairs) of 5' and 3' UTR fragments compiled from known disease associated variants. Two cell types were used.

      The findings confirm previous work about the importance of AREs, which helps show validity and adds some detailed comparisons of specific AU-rich motif effects in these two cell types.

      Using a Lasso regression, TA-dinucleotide content is identified as a strong regulator of RNA stability in a context dependent manner based on GC content and presence of RNA binding protein binding motifs. The findings have potential importance, drawing attention to a UTR feature that is not well characterized.

      The use of complementary datasets, including from half-life analyses of RNAs and from random sequence library MRPA's, is a useful addition and supports several important findings. The finding the TA dinucleotides have explanatory power separate from (and in some cases interacting with) GC content is valuable.

      The functional enrichment analysis suggests some new ideas about how UTRs may contribute to regulation of certain classes of genes.

      Weaknesses:

      It is difficult to understand how the calculations for half-life were performed. The sequencing approach measures the relative frequency of each sequence at each time point (less stable sequences become relatively less frequent after time 0, whereas more stable sequences become relatively more frequent after time 0). Since there is no discussion of whether the abundance of the transfected RNA population is referenced to some external standard (e.g., housekeeping RNAs), it is not clear how absolute (rather than relative) half-lives were determined.

      We estimated decay constant λ and half-life (t<sub>1/2</sub>) by the following equations:

      where C<sub>i(t)</sub> and C<sub>i(t=0)</sub> are read count values of the ith replicate at time points 𝑡 and 0 (see also Methods). The absolute abundance was not required for the half-life calculation. 

      Fig. S1A and B are used to assess reproducibility. They show that read counts at a given time point correlate well across replicate experiments. However, this is not a good way to assess reproducibility or accuracy of the measurements of t1/2 are. (The major source of variability in read counts in these plots - especially at early time points - is likely the starting abundance of each RNA sequence, not stability.) This creates concerns about how well the method is measuring t1/2. Also creating concern is the observation that many RNAs are associated with half-lives that are much longer than the time points analyzed in the study. For example, based upon Figure S1 and Table S1 correctly, the median t1/2 for the 5' UTR library in HEK cells appears to be >700 minutes. Given that RNA was collected at 30, 75, and 120 minutes, accurate measurements of RNAs with such long half lives would seem to be very difficult.

      We estimated the half-life based on the following equations:

      where C<sub>i(t)</sub> and C<sub>i(t=0)</sub> are read count values of the ith replicate at time points 𝑡 and 0 (see also Methods). The calculation of the half-life involves first determining the decay constant 𝜆, which represents a constant rate of decay. Since 𝜆 is a constant, it is possible to accurately calculate it without needing data over the entire decay range. Our experimental design considers this by selecting appropriate time points to ensure a reliable estimation of 𝜆, and thus, the half-life. To determine the most suitable time points, we conducted preliminary experiments using RT-PCR.

      These experiments indicated that 30, 75, and 120 minutes provided an effective range for capturing the decay dynamics of the transcripts.

      There is no direct comparison of t1/2 between the two cell types studied for the full set of sequences studied. This would be helpful in understanding whether the regulatory effects of UTRs are generally similar across cell lines (as has been shown in some previous studies) or whether there are fundamental differences. The distribution of t1/2's is clearly quite different in the two cell lines, but it is important to know if this reflects generally slow RNA turnover in HEK cells or whether there are a large number of sequence-specific effects on stability between cell lines. A related issue is that it is not clear whether the relatively small number of significant variant effects detected in HEK cells versus SH-SY5Y cells is attributable to real biological differences between cell types or to technical issues (many fewer read counts and much longer half lives in HEK cells).

      For both cell lines, we selected oligonucleotides with R<sup>2</sup> > 0.5 and mean squared error (MSE) < 1 for analysis when estimating half-life (λ) by linear regression. This selection criterion was implemented to minimize the effect of experimental noise. After quality control, we selected common UTRs and compared the RNA half-lives of the two cell lines using a scatter plot. Author response image 2 shows that RNA half-lives are quite different between the cell lines, with a moderate similarity observed in the 5' UTRs (R = 0.21), while the correlation in the 3' UTRs is non-significant.

      Author response image 2.

      Despite the low correlation of mRNA half-life between the two cell lines, UA-dinucleotide and UA-rich sequences consistently emerge as the most significant destabilizing features, suggesting a shared regulatory mechanism across diverse cellular environments.

      The general assertion is made in many places that TA dinucleotides are the most prominent destabilizing element in UTRs (e.g., in the title, the abstract, Fig. 4 legend, and on p. 12). This appears to be true for only one of the two cell lines tested based on Fig. 3.

      UA-dinucleotides and other UA-rich sequences exhibit similar effects on RNA stability, as illustrated in Fig. S5A-C. In two cell lines, UA-dinucleotide and WWWWWW sequences were representatives of the same stability-affecting cluster. While the impact of UA-dinucleotides can be generalized, we have rephrased some statements for clarification to avoid any potential misunderstanding. For examples: 

      Abstract: “...We found that UA dinucleotides and UA-rich motifs are the most prominent destabilizing element.“

      p.10: “UA dinucleotides and UA-rich motifs are the most common and effective RNA destabilizing factor” 

      Figure 4: “The UTR UA dinucleotides and UA-rich motifs are the most common and influential RNA destabilizing factor.”

      Appraisal and impact:

      The work adds to existing studies that previously identified sequence features, including AREs and other RNA binding protein motifs, that regulate stability and puts a new emphasis on the role of "TA" (better "UA") dinucleotides. It is not clear how potential problems with the RNA stability measurements discussed above might influence the overall conclusions, which may limit the impact unless these can be addressed.

      It is difficult to understand whether the importance of TA dinucleotides is best explained by their occurrence in a related set of longer RBP binding motifs (see Fig 5J, these motifs may be encompassed by the "WWWWWW cluster") or whether some other explanation applies. Further discussion of this would be helpful. Does the LASSO method tend to collapse a more diverse set of longer motifs that are each relatively rare compared to the dinucleotide? It remains unclear whether TA dinucleotides are associated with less stability independent of the presence of the known larger WWWWWWW motif. As noted above, the importance of TA dinucleotides in the HEK experiments appears to be less than is implied in the text.

      To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. There is no evidence to support a preference for dinucleotides by LASSO. To address whether the destabilizing effect of UA dinucleotides is part of the broader WWWWWW motif, we divided UA dinucleotides into two groups: those within the WWWWWW motif and those outside of it. Specifically, we divided UTRs into two categories: 'at least one UA within a WWWWWW motif' and 'no UA within a WWWWWW motif,' and visualized the results using a boxplot. As shown in Author response image 3, the destabilizing trend still remains for UA dinucleotides outside of the WWWWWW motif, although the effect appears to be more pronounced when UA is within the WWWWWW motif. This suggests that while UA dinucleotides have a destabilizing effect independently, their impact is amplified when they are part of the broader WWWWWW motif.

      Author response image 3.

      The inclusion of more than a single cell type is an acknowledgement of the importance of evaluating cell type-specific effects. The work suggests a number of cell type-specific differences, but due to technical issues (especially with the HEK data, as outlined above) and the use of only two cell lines, it is difficult to understand cell type effects from the work.

      The inclusion of both 3' and 5' UTR sequences distinguishes this work from most prior studies in the field. Contrasting the effects of these regions on stability is of interest, although the role of these UTRs (especially the 5' UTR) in translational regulation is not assessed here.

      We examined the role of UTR and UTR variants in translation regulation using polysome profiling. By both univariate analysis and an elastic regression model, we identified motifs of short repeated sequences, including SRSF2 binding sites, as mutation hotspots that lead to aberrant translation. Furthermore, these polysome-shifting mutations had a considerable impact on RNA secondary structures, particularly in upstream AUG-containing 5’ UTRs. Integrating these features, our model achieved high accuracy (AUROC > 0.8) in predicting polysome-shifting mutations in the test dataset. Additionally, metagene analysis indicated that pathogenic variants were enriched at the upstream open reading frame (uORF) translation start site, suggesting changes in uORF usage underlie the translation deficiencies caused by these mutations. Illustrating this, we demonstrated that a pathogenic mutation in the IRF6 5’ UTR suppresses translation of the primary open reading frame by creating a uORF. Remarkably, site-directed ADAR editing of the mutant mRNA rescued this translation deficiency. Because the regulation of translation and stability does not converge, we illustrate these two mechanisms in two separate manuscripts (this one and doi.org/10.1101/2024.04.11.589132).

      Reviewer #3 (Public Review):

      Summary:

      In their manuscript titled "Multiplexed Assays of Human Disease‐relevant Mutations Reveal UTR

      Dinucleotide Composition as a Major Determinant of RNA Stability" the authors aim to investigate the effect of sequence variations in 3'UTR and 5'UTRs on the stability of mRNAs in two different human cell lines.

      To do so, the authors use a massively parallel reporter assay (MPRA). They transfect cells with a set of mRNA reporters that contain sequence variants in their 3' or 5' UTRs, which were previously reported in human diseases. They follow their clearance from cells over time relative to the matching non-variant sequence. To analyze their results, they define a set of factors (RBP and miRNA binding sites, sequence features, secondary structure etc.) and test their association with differences in mRNA stability. For features with a significant association, they use clustering to select a subset of factors for LASSO regression and identify factors that affect mRNA stability.

      They conclude that the TA dinucleotide content of UTRs is the strongest destabilizing sequence feature. Within that context, elevated GC content and protein binding can protect susceptible mRNAs from degradation. They also show that TA dinucleotide content of UTRs affects native mRNA stability, and that it is associated with specific functional groups. Finally, they link disease associated sequence variants with differences in mRNA stability of reporters.

      Strengths:

      This work introduces a different MPRA approach to analyze the effect of genetic variants. While previous works in tissue culture use DNA transfections that require normalization for transcription efficiency, here the mRNA is directly introduced into cells at fixed amounts, allowing a more direct view of the mRNA regulation.

      The authors also introduce a unique analysis approach, which takes into account multiple factors that might affect mRNA stability. This approach allows them to identify general sequence features that affect mRNA stability beyond specific genetic variants, and reach important insights on mRNA stability regulation. Indeed, while the conclusions to genetic variants identified in this work are interesting, the main strength of the work involve general effect of sequence features rather than specific variants.

      The authors provide adequate supports for their claims, and validate their analysis using both their reporter data and native genes. For the main feature identified, TA di-nucleotides, they perform follow-up experiments with modified reporters that further strengthen their claims, and also validate the effect on native cellular transcripts (beyond reporters), demonstrating its validity also within native scenarios.

      The work provides a broad analysis of mRNA stability, across two mRNA regulatory segments (3'UTR and 5'UTR) and is performed in two separate cell-types. Comparison between two different cell-types is adequate, and the results demonstrate, as expected, the dependence of mRNA stability on the cellular context. Analysis of 3'UTR and 5'UTR regulatory effects also shows interesting differences and similarities between these two regulatory regions.

      Weaknesses:

      (1) The authors fail to acknowledge several possible confounding factors of their MPRA approach in the discussion.

      First, while transfection of mRNA directly into cells allows to avoid the need to normalize for differences in transcription, the introduction of naked mRNA molecules is different than native cellular mRNAs and could introduce biases due to differences in mRNA modifications, protein associations etc. that may occur co-transcriptionally.

      Second, along those lines, the authors also use in-vitro polyadenylation. The length of the polyA tail of the transfected transcripts could potentially be very different than that of native mRNAs and also affect stability.

      The transcripts used in our study were polyadenylated in vitro with approximately 100 nucleotides 

      (Fig. S1C), similar to the polyA tail lengths typically observed in vivo (dx.doi.org/10.1016/j.molcel.2014.02.007).  Additionally, these transcripts were capped to emulate essential mRNA characteristics and to minimize immune responses in recipient cells. This design allows us to study RNA decay for in vitro-synthesized RNA delivered into human cells, akin to RNA vaccines, but it does not necessarily extend to endogenous RNAs. As mentioned, endogenous RNAs undergo nuclear processing and are decorated by numerous trans factors, resulting in distinct regulatory mechanisms. We therefore provided a more discussion on these differences and their implications in the revised manuscript: “However, while our approach effectively assesses the stability of synthesized RNA in human cells, it may not fully capture the decay dynamics of nuclear-synthesized RNA, which can be influenced by endogenous modifications and trans-acting RNA binding factors. (p. 18)”

      (2) The analysis approach used in this work for identifying regulatory features in UTRs was not previously used. As such, lack of in-depth details of the methodology, and possibly also more general validation of the approach, is a drawback in convincing the reader in the validity of this approach and its results.

      In particular, a main point that is not addressed is how the authors decide on the set of "factors" used in their analysis? As choosing different sets of factors might affect the results of the analysis. 

      In our study, we employed the calculation of the Variance Inflation Factor (VIF) as a basis for selecting variables. This well-established method is widely used to detect variables with high collinearity, thus ensuring the robustness and reliability of our analysis. By identifying and excluding highly collinear variables, we aimed to minimize multicollinearity and improve the accuracy of our regression models. For more detailed information on the use of VIF in regression analysis, please refer to Akinwande, M., Dikko, H., and Samson, A. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. doi: 10.4236/ojs.2015.57075. We have included the method details in the revised manuscript (p. 28) :”… to avoid multicollinearity caused by similar features that perturb feature selection, all features were clustered using single-linkage hierarchical clustering with the distance metric defined as one minus the absolute value of the Spearman correlation coefficient. We cut the tree at a specific height, and the feature that had the greatest influence on RNA stability, which was examined using a simple linear regression model, was selected to be the representative of each cluster. Then we calculated the variance inflation factor (VIF) value of the representative features. The VIFs were obtained by the following linear model and equations:

      where and are the estimated value of the jth feature and the value of the kth feature of the ith UTR (note that the kth feature is a feature other than the jth feature), and are the intercept and the regression coefficients of the linear model that regressed the jth feature on the other remaining features, and is the mean level of the jth feature of all UTRs.”

      For example, the choice to use 7-mer sequences within the factors set is not explained, particularly when almost all motifs that are eventually identified (Figure 3B-E) are shorter.

      The known RBP motifs are primarily 6-mer. To explore the possibility of discovering novel motifs that could significantly impact our model, we started with 7-mer sequences. However, our analysis revealed that including these additional variables did not improve the explanatory power of the model; instead, it reduced it. Consequently, our final model focuses on motifs shorter than 7-mer. We explained the motif selections in the revised manuscript (p. 9): “Given our discovery that the effect of AREs is heavily dependent on sequence content, we decided to further explore the effects of other sequence elements, i.e., beyond known regulatory motifs, in more detail. Since most reported RBP motifs are 6-mers, we initiated a search for novel motifs by analyzing the presence of all 7-mers in our massively parallel reporter assay (MPRA) library, correlating their occurrence with mRNA half-life.”

      In addition, the authors do not perform validations to demonstrate the validity of their approach on simulated data or well-established control datasets. Such analysis would be helpful to further convince the reader in the usefulness and robustness of the analysis.

      We acknowledge the importance of validating our approach on simulated data or well-established control datasets to demonstrate its robustness and reliability. However, to the best of our knowledge, there are currently no well-established control datasets available that perfectly correspond to our specific study context. Despite this, we will continue to search for any relevant datasets that could be utilized for this purpose in future work. This effort will help to further reinforce the confidence in our methodology and its findings.

      (3) The analysis and regression models built in this work are not thoroughly investigated relative to native genes within cells. The effect of sequence "factors" on native cellular transcripts' stability is not investigated beyond TA di-nucleotides, and it is unclear to what degree do other predicted factors also affect native transcripts.

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we validated the UTR UA-dinucleotide effect in vivo, we did not intend to conclude that this is the most influential regulation for endogenous RNAs. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, we controlled for these factors in our experiments. Therefore, we acknowledge that several endogenous features, which were excluded by our approach, may serve as predictive features of RNA half-life in vivo. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:  

      Some references are missing, e.g for the sentence:

      Please see the response below.

      "Similarly, point mutation of the GFPT1 3' UTR results in congenital myasthenic syndrome." (p5)

      The reference has been added to the text:

      Dusl, M., Senderek, J., Muller, J. S., Vogel, J. G., Pertl, A., Stucka, R., Lochmuller, H., David, R., & Abicht, A. (2015). A 3'-UTR mutation creates a microRNA target site in the GFPT1 gene of patients with congenital myasthenic syndrome. Human Molecular Genetics, 24(12), 34183426. https://doi.org/10.1093/hmg/ddv090 

      "...but there have been no systematic assessments of the explicit effects of variants of both UTRs on stability regulation." (not true in the current phrasing; e.g. PMIDs 32719458, 36156153, 34849835)

      These references have been added to the text. However, we have to point out that these studies do not focus on the effects of the disease-relevant variants. To clarify, we modified the sentence to "... systematic assessments of the explicit effects of disease-relevant variants in both UTRs on stability regulation are still absent."

      "Multiple approaches have revealed AREs as exerting a destabilizing effect on RNA stability (Barreau et al., 2005). (p8)

      The reference has been added to the text:

      Barreau, C., Paillard, L., & Osborne, H. B. (2005). AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Research, 33(22), 7138-7150. https://doi.org/10.1093/nar/gki1012 

      "This effect is specific, as such ratios in the coding region are inconsequential." (p12)

      This refers to our findings of Fig. 4G and Supplemental Fig. S5F.

      What are the sequences at the 5' and 3'UTR without insertion of a library? 5'UTR library (especially in SH) has much longer half-life compared to 3'utr library (Fig S1D).

      There is no designed 5’UTR of the 3’UTR library, only the Kozak sequence derived from the pEGFPC1 vector. This may partially underlie the shorter half-life of the 3’ UTR library.

      Fig2A: What are the units? "half-life (log)" Do the numbers correspond to log10(min)?

      It represents ln (min). To clarify, we now use ‘ln t<sub>1/2</sub> (min)’ in all figures.

      Fig 2 and 3: This was done only on the wild-type sequences? Or all tested sequences together, wt and mut?

      It was done only on the wild-type sequences. To clarify, we modified the text to “we examined the effect of AREs on RNA stability of the ref alleles according to specific sequence content….(p.8)” and “We considered as many factors as possible to explain the half-life of our ref UTR libraries,…. (p.9)”. ‘ref’ stands for reference.

      "Furthermore, to avoid collinearity confounding our model, e.g., the effects of very similar factors (such as 'AA' and 'AAA' sequences), we clustered the factors according to their properties, and then only one representative factor from within a cluster (i.e., the one with the highest correlation to halflife within a cluster) was subjected to LASSO regression": Given the observed context dependence, e.g. in the case of poly-U stretches: Isn't this clustering leading to similar/identical motifs with different context being grouped together (such as polyU preceded by an A (strongly destabilizing, according to Fig 2B) or followed by one (strongly stabilizing, according to Fig 2B), resulting in ignoring the context or using one potential outcome while a motif from the same cluster can have the opposite effect?

      Thank you very much for pointing this out. To determine if considering different contextual effects within each feature cluster would enhance model performance, we modified our feature selection by choosing both the feature with the largest positive and the largest negative effect on RNA half-life in Step III of Figure 3A. We then split the data into a 2:1 training and testing set and repeated this process 100 times. Model performance was evaluated using mean average error (MAE), root mean squared error (RMSE), and adjusted R-squared. From Author response image 4, we observed no significant improvement in model performance using this new approach. Notably, in the SH-SY5Y 5' UTR model, our original method even outperformed the modified one, with statistically lower MAE and RMSE and a higher adjusted R-squared. Therefore, we believe our current approach remains appropriate.

      Author response image 4.

      "Overall, motifs that are at least two nucleotides long proved critical for RNA stability, supporting the sequence specificity of the decay process." Unclear why this supports the "sequence specificity"

      No monomers were selected as an explanatory factor. On the contrary, specific sequence combinations and order are important for the regulation. These findings suggest sequence-specific recognition for the decay process.

      Fig3: The same features were used in both cell lines? If yes: Since they were selected for their highest correlation with half-life, how was a common set chosen? If no: problematic to compare.

      Thank you for your question regarding feature selection across cell lines. Initially, the features were collected uniformly for both cell lines. However, subsequent feature selection steps were cell-type specific, focusing on identifying features with the greatest impact on RNA half-life in each context. This approach allows us to still compare model performance and discuss the similarities and differences in selected features across cell types. By maintaining a consistent starting point, we ensure that any observed differences reflect cell-specific regulatory dynamics.

      uORFs were not used as features?

      Thank you for pointing this out. At the beginning of our study, we investigated the impact of Kozak sequence strength (categorized as weak, moderate, strong, or optimal) on RNA half-life. However, we found that this feature performed poorly in predicting RNA stability, and as a result, we decided not to include upstream open reading frames (uORFs) or Kozak sequences in our subsequent analyses.

      Experimental reproducibility: Only correlations between replicates for the same time point is shown, but no comparison between time points or between decay rates. How reproducible were the paired differences between mut/wt?

      The decay rate was calculated by modeling the slope of a linear regression of all time points. Therefore, there is only one decay rate associated with a genotype. To rule out inconsistent data, we excluded any regression with a mean square error greater than 1, as this indicates a poor fit of the data points. 

      Fig 7C/p17: This does not establish a "causal relationship" as the authors claim.

      We agree with the reviewer’s suggestion. We have modified the text on p.17 to “to establish a correlation between UTR variants and health outcomes,…..”

      In the discussion, the authors claim that TA-diNTs are not only an opposite of the GC percentage and base this on Fig 5A.

      Fig 5A: The range of TA-diNTs is naturally much higher in the low GC group. To make the high and low GC content comparable (as the authors aim to do), the correlation should be assessed for the same range of TA dint in both cases.

      To address this concern more rigorously, we performed a stratified analysis based on UA-diNT rate. As shown in our Fig. S7C, even after stratifying by UA- dinucleotide ratio (upper panel high UA- dinucleotide ratio / lower panel low UA- dinucleotide ratio), we still observe that the destabilizing effect of UA is stronger in the low GC content group.

      Supplemental Figure S7. Interplay of GC content and TA dinucleotide on stability regulation, related to Figure 5. (C) Stratifications of both TA dinucleotide ratio and GC content showed that the destabilizing effect of TA dinucleotide is the most prominent under conditions of low TA dinucleotide ratio and low GC content. The same trend was observed for 5’ UTR (left) and 3’ UTR (right).

      The injection of in vitro transcribed and polyA/capped RNA certainly has advantages over other methods, but delivering naked mRNA without nuclear history might also lead to artifacts. The caveats of the approach should be discussed more extensively.

      We appreciate the suggestion and have hence added the following in the Discussion (p.18): “However, while our approach effectively assesses the stability of synthesized RNA in human cells, it may not fully capture the decay dynamics of nuclear-synthesized RNA, which can be influenced by endogenous modifications and trans-acting RNA binding factors.”

      "We unexpectedly identified many crucial regulatory features in 5' UTRs." Why was this unexpected?

      We initially thought the 3’ UTR would play a major role in stability regulation. To avoid confusion, we have removed the word ‘unexpected’ from the text (p. 20): "We identified many crucial regulatory features in 5' UTRs."

      "...a massively parallel reporter assay in which coding regions and human 5'/3' UTRs with diseaserelevant mutations were generated in vitro and then directly transfected into human cell lines to assess their decay patterns by next‐generation sequencing": also coding regions?

      Thanks for the question. Indeed, the coding region was not synthesized together with the UTR library. Therefore, we modified the text of p. 6 to “…we developed a massively parallel reporter assay in which human 5’/3’ UTRs with disease-relevant mutations were generated in vitro, ligated with the enhanced green fluorescence protein (EGFP) coding region, and then directly transfected into human cell lines to assess their decay patterns by next-generation sequencing.”

      Reviewer #2 (Recommendations For The Authors):

      Nomenclature: When discussing RNA sequences, "U" should be used in place of "T" (e.g., "UA dinucleotide").

      We have replaced the RNA sequence “T” with “U” of the text and figures.

      Abstract: "We examined the RNA degradation patterns mediated by the UTR library in multiple cell lines" - It would be clearer to state that two cell lines (rather than multiple) were used.

      We appreciate the suggestion. We have modified the abstract as suggested: “We examined the RNA degradation patterns mediated by the UTR library in two cell lines…"

      The manuscript refers to "wild-type (WT) and mutant (mt) alleles." (p. 7 and elsewhere). It would be better to use "reference" instead of "wild type" given that these are human populations.

      We appreciate the suggestion. All instances of ‘wild-type’ or ‘WT’ in the text and figures have been replaced with ‘reference’ or ‘ref’.

      In the introduction, it is stated that traditional MPRAs "cannot differentiate the effect of the UTRs on transcription, stability and, in some cases, even protein production, greatly limiting scientific interpretation." This is confusing, since these assays can and have been used in association with both RNA decay measurements and measurements of reporter protein levels that allow assessment of effects on stability and protein production (including in the cited references).

      We reason that the RNA steady-state level (e.g., sequencing the overall RNA normalized to DNA) or protein steady-state level (e.g., detecting the fluorescence signal) does not precisely reveal the decay kinetics of the RNA. Steady-state level is a result of production and decay, both of which UTRs contribute to. Similarly, the protein level is not a perfect estimate of the RNA decay.

      To clarify, we have modified the introduction (p. 5) to “Nevertheless, because the steady-state level is a result of production and decay, these approaches cannot differentiate the effect of the UTRs on transcription, stability and, in some cases, even protein production, greatly limiting scientific interpretation.” 

      Adding raw and normalized read count data from individual experiments (e.g., to Table S1) would make it more likely for others to use this dataset to address additional questions.

      All raw and processed sequencing data generated in this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE217518 (reviewer token snspaakujtsdpcv).

      The manuscript would benefit from further clarification about model selection. Additional details regarding how the features were clustered, and the actual clusters themselves should be included.

      It should be discussed why Lasso was chosen vs Ridge or Elastic Net, in the context of handling multicollinearity. Often, data is subsetted for training and validation, and model performance metrics are presented.

      Thank you for pointing out the need for further clarification on model selection. The features were clustered using single-linkage hierarchical clustering with the distance metric defined as one minus the absolute value of the Spearman correlation coefficient (this information has been added to the manuscript on p. 28: “…to avoid multicollinearity caused by similar features that perturb feature selection, all features were clustered using single-linkage hierarchical clustering with the distance metric defined as one minus the absolute value of the Spearman correlation coefficient.”). The resulting feature clusters are available in Supplemental Table S3. 

      Regarding model selection, we chose LASSO over ridge and elastic net primarily for feature selection, as ridge does not perform feature selection. Elastic net is essentially a hybrid of ridge regression and LASSO regularization, but we opted for LASSO for its simplicity and effectiveness in selecting a sparse set of important features.

      We also performed a 2:1 training and testing set analysis and have included these details in the manuscript. Model performance metrics, including correlation coefficient between observed and predicted values in the testing set, mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and R-squared, are provided in new Supplemental Table S4.

      Recommend reviewing and correcting verb tenses in the methods section.

      We appreciate the reviewer’s suggestion. We have corrected verb tenses in the methods section, which includes “The UTRs were defined by NCBI RefSeq and ENCODE V27. (p.21)”, “The variant was placed in the middle of the sequence….(p.22)”, and “eCLIP signals with value < 1 or p value > 0.05 were removed. (p.26)”

      Please add information about which cell type(s) are being used in each of the figure legends (e.g., in Figs. 2B and 5).

      We appreciate the reviewer’s suggestion. We have added the cell type information in the figure legends: “Figure 2…. (B) The ten most influential AREs in terms of RNA stability in SH-SY5Y cells.” And “Figure 5…..(A) MPRA data of SH-SY5Y cells stratified according to the GC content (GC%) of UTRs.”

      Recommend review of axis labels and consistency in formatting the log(half-lives) and including the base of the log and the time unit (minutes). Even better, converting axis labels from log minutes to minutes would make this easier to understand.

      Thank you for the suggestion regarding axis labels and consistency. We have unified the half-life label to ‘ln t<sub>1/2</sub> (min)’ in all figures. We chose not to convert the axis from logarithmic minutes to minutes because the original scale is highly skewed, which would hinder clear data visualization.

      The discussion refers to Figure 1D but Figure 1 only has A-C

      Thank you for pointing out this mistake. ‘Fig. 1D’ has been changed to ‘Fig. 1B’ in the text (p. 7 and p. 20).

      The analyses in Fig. 2 are interpreted as demonstrating that AREs destabilize RNAs. These analyses are examining associations, so it would be more appropriate to say that AREs are associated with destabilization (since it is formally possible that other sequences that are present in these UTR fragment cause destabilization). A similar issue arises on p. 10: "TA dinucleotides alone can negatively regulate RNA stability, with a Pearson's correlation coefficient of ‐0.287 for 5' UTRs and ‐0.377 for 3' UTRs (Fig. 4A,C)." This is an association and does not establish causation. Again on p. 17: "We identified several SNPs in UTRs that induce aberrant RNA expression and/or protein expression (Supplemental Table S7)." These may be causal but may simply be in LD with other variants that are causal.

      We agree that the association observed is not proven to be causal. Therefore, we modified the text as suggested: 

      “AUUUA/AUUA-containing AREs are associated with RNA destabilization.” (p. 8)

      “UA dinucleotides alone present a negative correlation with RNA stability, with a Pearson’s correlation coefficient of -0.287 for 5’ UTRs and -0.377 for 3’ UTRs.”  (p.10)

      “We identified several SNPs in UTRs that correlated with aberrant RNA expression and/or protein expression.”  (p. 17)

      Figure 4C is important in that it examines whether variant sequences that differ in a manner that changes the number of dinucleotide repeats affect stability. Please show the number (not just the percentage) of sequences in each category.

      Thank you for your insightful comment. We believe the figure you referred to is Figure 4E. We have updated the figure to include the number of sequences in each category.

      Figure 6A and B: The horizontal axes appear to be misaligned since the dotted vertical lines do not cross at 0. ?

      The dotted vertical lines represent the genomic background of the UA-diNT ratio. To clarify it, we have modified the legend to: “Figure 6……(A) The top ten biological processes for which the 5’ UTR UA-dinucleotide ratio most significantly deviated from the genomic background (dashed line).”

      It may be helpful to state what the dashed and solid lines represent on Figure 6 E/F. Please correct spelling of "Biological" in 6E.

      As per the reviewer’s suggestions, we have modified the legend of Figure 6 to: “………..(E) Biological processes for RNAs in which the UA-dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background (dashed lines). (F) Molecular functions for RNAs in which the UA-dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background (dashed lines). The thin solid lines represent the standard deviation of the UAdinucleotide ratio within the gene group.” 

      In addition, the spelling of “Biological” in Fig. 6E has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      I have 3 points that I think could improve science and its presentation within the manuscript.

      (1) Most importantly, how well do LASSO regression models predict the stability of native transcripts? Such analysis can also be useful for comparison between two different cell-types. How well does the regression model learned (on reporters) within one cell-type predict mRNA stability (of reporters and native genes) in this cell-type and in the other cell-type? Similarly, models can also help to analyze the effects of 5'UTR and 3'UTR sequences on mRNA stability. In particular, how well does the regression model of each separate regulatory sequence (3'UTR or 5'UTR) is able to predict the stability of native genes in the cell? Can the predictions be improved by combining both 3'UTR and 5'UTR sequence features within the regression models?

      The decay model for native transcripts has been established in prior research (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x), which indicates that exon junction density and transcript length are the primary determinants of RNA stability. Based on these findings, we designed the MPRA with fixed length and without splicing to focus on the contribution of primary sequences. We validated the destabilizing effect of UA dinucleotide on endogenous RNAs (Fig. 4G and Supplemental Fig. S5F) but do not recommend using our model to fully explain or predict the stability of native transcripts.

      To assess the model's cross-cell type predictive performance for RNA half-life, we employed the Regression Error Characteristic (REC) curve (Bi & Bennett, 2003). Similar to the receiver operating characteristic (ROC) curve, the REC curve illustrates the trade-off between error tolerance and accuracy, with better performance indicated by curves trending toward the upper left. We also computed the Area Over the Curve (AOC) as a performance metric, where lower values indicate better predictive ability. From Author response image 5, the REC curves reveal that cross-cell type prediction performance is suboptimal. The y-axis represents prediction accuracy, while the x-axis denotes error tolerance for the natural logarithm of RNA half-life (ln(𝑡<sub>1/2</sub>), in minutes).

      Author response image 5.

      In response to the suggestion of combining 5' and 3' UTR sequence features in the regression model, we believe this approach may not be ideal. As shown in Figure S1D, the distribution of RNA half-lives between 5' and 3' UTRs is significantly different, reflecting their distinct regulatory roles. Additionally, the base composition differs, with 5' UTRs having a higher GC content compared to 3' UTRs. Combining these datasets would likely make the origin of the sequence (5' or 3' UTR) the most predictive feature, thereby reducing the model's interpretability. Furthermore, our MPRA results, derived from separate 5’ or 3’ UTR library, do not support a combined model, further suggesting this approach may not be suitable with our data.

      The conclusions regarding genetic variants are interesting, yet the main strength of the work involves identifying general sequence features that affect mRNA stability rather than specific variants. I wonder if the authors have considered to shift the focus of the motivation part to reflect that?

      We appreciated the reviewer’s suggestion. We have revised the abstract and introductions to emphasize the general UTR regulation. Here is the revised abstract:

      UTRs contain crucial regulatory elements for RNA stability, translation and localization, so their integrity is indispensable for gene expression. Approximately 3.7% of genetic variants associated with diseases occur in UTRs, yet a comprehensive understanding of UTR variant functions remains limited due to inefficient experimental and computational assessment methods. To systematically evaluate the effects of UTR variants on RNA stability, we established a massively parallel reporter assay on 6,555 UTR variants reported in human disease databases. We examined the RNA degradation patterns mediated by the UTR library in two cell lines, and then applied LASSO regression to model the influential regulators of RNA stability. We found that UA dinucleotides and UA-rich motifs are the most prominent destabilizing element. Gain of UA dinucleotide outlined mutant UTRs with reduced stability. Studies on endogenous transcripts indicate that high UA-dinucleotide ratios in UTRs promote RNA degradation. Conversely, elevated GC content and protein binding on UA dinucleotides protect high-UA RNA from degradation. Further analysis reveals polarized roles of UA-dinucleotide-binding proteins in RNA protection and degradation. Furthermore, the UA-dinucleotide ratio of both UTRs is a common characteristic of genes in innate immune response pathways, implying a coordinated stability regulation through UTRs at the transcriptomic level. We also demonstrate that stability-altering UTRs are associated with changes in biobank-based health indices, underscoring the importance of precise UTR regulation for wellness. Our study highlights the importance of RNA stability regulation through UTR primary sequences, paving the way for further exploration of their implications in gene networks and precision medicine.

      Plots presenting correlations (e.g., Figure 4A, 4C) are more informative when plotted as density plots (i.e., using colorscale to show density of the dots at each part of the plot).

      We greatly appreciate the reviewer's insightful suggestion regarding the use of density plots for presenting correlations. We have modified Figures 4A and 4C in the revised manuscript to implement density plotting. The updated figures now utilize a colorscale that highlights areas of high and low data density.

    1. eLife Assessment

      This important study enhances our understanding of the foraging behaviour of aerial insectivorous birds. Using solid methodology, the authors have collected extensive data on bird movements and prey availability, which in turn provide support for the main claim of the study. The work will be of broad interest to behavioural ecologists.

    2. Reviewer #1 (Public review):

      This study tests whether Little Swifts exhibit optimal foraging, which the data seem to indicate is the case. This is unsurprising as most animals would be expected to optimize the energy income : expenditure ratio, however it hasn't been explicitly quantified before the way it was in this manuscript.

      The major strength of this work is the sheer volume of tracking data and the accuracy of those data. The ATLAS tracking system really enhanced this study and allowed for pinpoint monitoring of the tracked birds. These data could be used to ask and answer many questions beyond just the one tested here.

      The major weakness of this work lies in the sampling of insect prey abundance at a single point on the landscape, 6.5 km from the colony. This sampling then requires the authors to work under the assumption that prey abundance is simultaneously even across the study region. It may be fair to say that prey populations might be correlated over space but are not equal. It is uncertain whether other aspects of the prey data are problematic. For example, the radar only samples insects at 50m or higher from the ground - how often do Little Swifts forage under 50m high?

      The finding that Little Swifts forage optimally is indeed supported by the data, notwithstanding some of the shortcomings in the prey abundance data. The authors achieved their aims and the results support their conclusions.

      At its centre, this work adds to our understanding of Little Swift foraging and extends to a greater understanding of aerial insectivores in general. While unsurprising that Little Swifts act as optimal foragers, it is good to have quantified this and show that the population declines observed in so many aerial insectivores are not necessarily a function of inflexible foraging habits. Further, the methods used in this research have great potential for other work. For example, the ATLAS system poses some real advantages and an exciting challenge to existing systems, like MOTUS. The radar that was used to quantify prey abundance also presents exciting possibilities if multiple units could be deployed to get a more spatially-explicit view.

      To improve the context of this work, it is worth noting that this research goes into much further depth than any previous studies on a similar topic in several flycatcher and swallow species. A further justification is posited that this research is needed due to dramatic insect population declines, however, the magnitude and extent of such declines are fiercely debated in the literature.

    3. Reviewer #2 (Public review):

      Summary:

      Bloch et al. studied the relationships between aerial foragers (lesser swifts) tracked using an automated radio telemetry system (Atlas) and their prey (flying insects) monitored using a small vertical-looking radar (BirdScan MR1). The aim of the study was to check whether swifts optimise their foraging according to the abundance of their prey. The results provide evidence that small swifts can increase their foraging rate when aerial insect abundance is high, but found no correlation between insect abundance and flight energy expenditure.

      Key points:

      This study fills gaps in fundamental knowledge of prey-predator dynamics in the air. It describes the coincidence between the abundance of flying insects and the characteristics derived from monitoring individual swifts.

      Weaknesses:

      The paper uses assumptions largely derived from optimal foraging theory, but mixes up the form of natural selection: parental energy, parental survival (predation risk), nestling foraging and reproductive success. The results are partly inconsistent, and confounding factors (e.g., the brooding phase versus the nestling phase) remained ignored. In conclusion, the analyses performed are insufficient to rigorously assess whether lesser swifts are optimising their foraging beyond making shorter foraging trips.

      The filters applied to the monitoring data are necessary but may strongly influence the characteristics derived based on maximum or mean values. Sensitivity tests or the use of characteristics that are less dependent on extreme values could provide more robust results.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      I am generally satisfied with the authors' revisions and response to my previous comments. I have amended my previous review.

      We thank Reviewer #1 for his valuable comments and suggestions, which improved this manuscript.

      Thank you for considering the comments in your revised version. I still feel a strong mismatch between the claims of optimal foraging behaviour and the results with little compelling evidence.

      On terminology: MTR means Migration Traffic Rates. The authors responded that in their study, MTR is defined as Movement traffic rates. I have two problems with this definition: i) it creates confusion in the literature on the definition of MTR, ii) a traffic inherently describes a movement, and this pleonasm is not necessary.

      We revised the acronyms in this article, replacing MTR with MoTR to clearly distinguish between Migration Traffic Rate (MTR) and Movement Traffic Rate (MoTR).

      Minimal size of insects: Please detail radar settings (power sent, STC; detection thresholds). These parameters define the minimal size of the detected animals.

      We added the following paragraph to provide additional information regarding the radar's detection capabilities:

      " with decreasing detection probability at increasing altitudes. The detection threshold, defined by the STC setting, was 93 dBm, and the transmit power was 25 kW."

    1. eLife Assessment

      In this valuable study, Li et al., set out to understand the mechanisms of audiovisual temporal recalibration - the brain's ability to adjust to the latency differences that emerge due to different (distance-dependent) transduction latencies of auditory and visual signals - through psychophysical measurements and modeling. The analysis and specification of a formal model for this process provide convincing evidence to supports a role for causal inference in recalibration.

    2. Reviewer #1 (Public review):

      This study asks whether the phenomenon of crossmodal temporal recalibration, i.e. the adjustment of time perception by consistent temporal mismatches across the senses, can be explained by the concept of multisensory causal inference. In particular they ask whether the explanation offered by causal inference better explains temporal recalibration better than a model assuming that crossmodal stimuli are always integrated, regardless of how discrepant they are.

      The study is motivated by previous work in the spatial domain, where it has been shown consistently across studies that the use of crossmodal spatial information is explained by the concept of multisensory causal inference. It is also motivated by the observation that the behavioral data showcasing temporal recalibration feature nonlinearities that, by their nature, cannot be explained by a fixed integration model (sometimes also called mandatory fusion).

      To probe this the authors implemented a sophisticated experiment that probed temporal recalibration in several sessions. They then fit the data using the two classes of candidate models and rely model criteria to provide evidence for their conclusion. The study is sophisticated, conceptually and technically state-of-the-art and theoretically grounded. The data clearly support the authors conclusions.

      I find the conceptual advance somewhat limited. First, by design the fixed integration model cannot explain data with a nonlinear dependency on multisensory discrepancy, as already explained in many studies on spatial multisensory perception. Hence, it is not surprising that the causal inference model better fits the data. Second, and again similar to studies on spatial paradigms, the causal inference model fails to predict the behavioral data for large discrepancies. The model predictions in Figure 5 show the (expected) vanishing recalibration for large delta, while the behavioral data don't' decay to zero. Either the range of tested SOAs is too small to show that both the model and data converge to the same vanishing effect at large SOAs, or the model's formula is not the best for explaining the data. Again, the studies using spatial paradigms have the same problem, but in my view this poses the most interesting question here.

      In my view there is nothing generally wrong with the study, it does extend the 'known' to another type of paradigm. However, it covers little new ground on the conceptual side.<br /> On that note, the small sample size of n=10 is likely not an issue, but still it is on the very low end for this type of study.

      Comments on revision:

      The revision has addressed most of these points and makes for a much stronger contribution. The issue of sample size remains.

    3. Reviewer #2 (Public review):

      Summary:

      Li et al.'s goal is to understand the mechanisms of audiovisual temporal recalibration. This is an interesting challenge that the brain readily solves in order to compensate for real-world latency differences in the time of arrival of audio/visual signals. To do this they perform a 3-phase recalibration experiment on 9 observers that involves a temporal order judgment (TOJ) pretest and posttest (in which observers are required to judge whether an auditory and visual stimulus were coincident, auditory leading or visual leading) and a conditioning phase in which participants are exposed to a sequence of AV stimuli with a particular temporal disparity. Participants are required to monitor both streams of information for infrequent oddballs, before being tested again in the TOJ, although this time there are 3 conditioning trials for every 1 TOJ trial. Like many previous studies, they demonstrate that conditioning stimuli shift the point of subjective simultaneity (pss) in the direction of the exposure sequence.

      These shifts are modest - maxing out at around -50 ms for auditory leading sequences and slightly less than that for visual leading sequences. Similar effects are observed even for the longest offsets where it seems unlikely listeners would perceive the stimuli as synchronous (and therefore under a causal inference model you might intuitively expect no recalibration, and indeed simulations in Figure 5 seem to predict exactly that which isn't what most of their human observers did). Overall I think their data contribute evidence that a causal inference step is likely included within the process of recalibration.

      Strengths:

      The manuscript performs comprehensive testing over 9 days and 100s of trials and accompanies this with mathematical models to explain the data. The paper is reasonably clearly written and the data appear to support the conclusions.

      Comments on revision:

      In the revised manuscript the authors incorporate an alternative model (the asynchrony contingent model), and demonstrate that the causal inference model still out performs this. They provide additional analysis with Bayes factors to perform model comparisons, and provide significant individual subject data in the supplementary materials. Overall they have addressed most of the key points that my original review raised, including a demonstration of the conditions under which recalibration effects do not delay to zero over long delays. The number of subjects remains rather low, but at least we can now appreciate the heterogeneity within them. I still have some reservations about the magnitude of the conceptual advance that this study makes.

    4. Reviewer #3 (Public review):

      Summary:

      Li et al. describe an audiovisual temporal recalibration experiment in which participants perform baseline sessions of ternary order judgments about audiovisual stimulus pairs with various stimulus-onset asynchronies (SOAs). These are followed by adaptation at several adapting SOAs (each on a different day), followed by post-adaptation sessions to assess changes in psychometric functions. The key novelty is the formal specification and application/fit of a causal-inference model for the perception of relative timing, providing simulated predictions for the complete set of psychometric functions both pre and post adaptation.

      Strengths:

      (1) Formal models are preferable to vague theoretical statements about a process, and prior to this work, certain accounts of temporal recalibration (specifically those that do not rely on a population code) had only qualitative theoretical statements to explain how/why the magnitude of recalibration changes non-linearly with the stimulus-onset asynchrony of the adaptor.<br /> (2) The experiment is appropriate, the methods are well described, and the average model prediction is a good match to the average data (Figure 4). Conclusions are supported by the data and modelling.<br /> (3) The work should be impactful. There seems a good chance that this will become the go-to modelling framework for those exploring non population-code accounts of temporal recalibration (or comparing them with population-code accounts).<br /> (4) Key issues for the generality of the model, such as recalibration asymmetries reported by other authors that are inconsistent with those reported here, are thoughtfully discussed.

      Weaknesses:

      (1) Models are not compared using a gold-standard measure such as leave-one-out cross validation. However, this is legitimate given lengthy model fitting times, and a sensible approximation is presented.<br /> (2) The model misses in a systematic way for the psychometric functions of some participants/conditions. In addition to misses relating to occasional failures to estimate the magnitude of recalibration, some of the misses are because all functions are only permitted to shift in central tendency (whereas some participants show changes better characterized at one or both decision criteria). Given the fact that the modelling in general embraces individual differences, it might have been worth allowing different kinds of change for different participants. However, this is not really critical for the central concern (changes in the magnitude of recalibration for different adaptors) and there is a limit to how much can be done along these lines without making the model too flexible to test.<br /> (3) As a minor point, the model relies on simulation, which may limit its take-up/application by others in the field (although open access code will be provided).

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study asks whether the phenomenon of crossmodal temporal recalibration, i.e. the adjustment of time perception by consistent temporal mismatches across the senses, can be explained by the concept of multisensory causal inference. In particular, they ask whether the explanation offered by causal inference better explains temporal recalibration better than a model assuming that crossmodal stimuli are always integrated, regardless of how discrepant they are.

      The study is motivated by previous work in the spatial domain, where it has been shown consistently across studies that the use of crossmodal spatial information is explained by the concept of multisensory causal inference. It is also motivated by the observation that the behavioral data showcasing temporal recalibration feature nonlinearities that, by their nature, cannot be explained by a fixed integration model (sometimes also called mandatory fusion).

      To probe this the authors implemented a sophisticated experiment that probed temporal recalibration in several sessions. They then fit the data using the two classes of candidate models and rely on model criteria to provide evidence for their conclusion. The study is sophisticated, conceptually and technically state-of-the-art, and theoretically grounded. The data clearly support the authors’ conclusions.

      I find the conceptual advance somewhat limited. First, by design, the fixed integration model cannot explain data with a nonlinear dependency on multisensory discrepancy, as already explained in many studies on spatial multisensory perception. Hence, it is not surprising that the causal inference model better fits the data.

      We have addressed this comment by including an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration effects by employing a heuristic approximation of the causal-inference process (Fig. 3). We also updated the previous competitor model with a more reasonable asynchrony-correction model as the baseline of model comparison, which assumes recalibration aims to restore synchrony whenever the sensory measurement of SOA indicates an asynchrony. The causal-inference model outperformed both models, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual levels (Fig. S4).

      Second, and again similar to studies on spatial paradigms, the causal inference model fails to predict the behavioral data for large discrepancies. The model predictions in Figure 5 show the (expected) vanishing recalibration for large delta, while the behavioral data don’t decay to zero. Either the range of tested SOAs is too small to show that both the model and data converge to the same vanishing effect at large SOAs, or the model's formula is not the best for explaining the data. Again, the studies using spatial paradigms have the same problem, but in my view, this poses the most interesting question here.

      We included an additional simulation (Fig. 5B) to show that the causal-inference model can predict non-zero recalibration for long adapter SOAs, especially in observers with a high common-cause prior and low sensory precision. This ability to predict a non-zero recalibration effect even at large SOA, such as 0.7 s, is one key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      In my view there is nothing generally wrong with the study, it does extend the 'known' to another type of paradigm. However, it covers little new ground on the conceptual side.

      On that note, the small sample size of n=10 is likely not an issue, but still, it is on the very low end for this type of study.

      This study used a within-subject design, which included 3 phases each repeated in 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al.’s goal is to understand the mechanisms of audiovisual temporal recalibration. This is an interesting challenge that the brain readily solves in order to compensate for real-world latency differences in the time of arrival of audio/visual signals. To do this they perform a 3-phase recalibration experiment on 9 observers that involves a temporal order judgment (TOJ) pretest and posttest (in which observers are required to judge whether an auditory and visual stimulus were coincident, auditory leading or visual leading) and a conditioning phase in which participants are exposed to a sequence of AV stimuli with a particular temporal disparity. Participants are required to monitor both streams of information for infrequent oddballs, before being tested again in the TOJ, although this time there are 3 conditioning trials for every 1 TOJ trial. Like many previous studies, they demonstrate that conditioning stimuli shift the point of subjective simultaneity (pss) in the direction of the exposure sequence.

      These shifts are modest - maxing out at around -50 ms for auditory leading sequences and slightly less than that for visual leading sequences. Similar effects are observed even for the longest offsets where it seems unlikely listeners would perceive the stimuli as synchronous (and therefore under a causal inference model you might intuitively expect no recalibration, and indeed simulations in Figure 5 seem to predict exactly that which isn't what most of their human observers did). Overall I think their data contribute evidence that a causal inference step is likely included within the process of recalibration.

      Strengths:

      The manuscript performs comprehensive testing over 9 days and 100s of trials and accompanies this with mathematical models to explain the data. The paper is reasonably clearly written and the data appear to support the conclusions.

      Weaknesses:

      While I believe the data contribute evidence that a causal inference step is likely included within the process of recalibration, this to my mind is not a mechanism but might be seen more as a logical checkpoint to determine whether whatever underlying neuronal mechanism actually instantiates the recalibration should be triggered.

      We have addressed this comment by replacing the fixed-update model with an asynchrony-correction model, which assumes that the system first evaluates whether the measurement of SOA is asynchronous, thus indicating a need for recalibration (Fig. 3). If it does, it shifts the audiovisual bias by a proportion of the measured SOA. We additionally included an asynchrony-contingent model, which is capable of replicating the nonlinearity of recalibration effects by a heuristic approximation of the causal-inference process.

      Model comparisons indicate that the causal-inference model of temporal recalibration outperforms both alternative models (Fig. 4A). Furthermore, the model predictions demonstrate that the causal-inference model more accurately captures recalibration at large SOAs at both the group level (Fig. 4B) and individual level (Fig. S4).

      The authors’ causal inference model strongly predicts that there should be no recalibration for stimuli at 0.7 ms offset, yet only 3/9 participants appear to show this effect. They note that a significant difference in their design and that of others is the inclusion of longer lags, which are unlikely to originate from the same source, but don’t offer any explanation for this key difference between their data and the predictions of a causal inference model.

      We added further simulations to show that the causal-inference model can predict non-zero recalibration also for longer adapter SOAs, especially in observers with a large common-cause prior (Fig. 5A) and low sensory precision (Fig. 5B). This ability to predict a non-zero recalibration effect even at longer adapter SOAs, such as 0.7 s, is a key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      I’m also not completely convinced that the causal inference model isn’t ‘best’ simply because it has sufficient free parameters to capture the noise in the data. The tested models do not (I think) have equivalent complexity - the causal inference model fits best, but has more parameters with which to fit the data. Moreover, while it fits ‘best’, is it a good model? Figure S6 is useful in this regard but is not completely clear - are the red dots the actual data or the causal inference prediction? This suggests that it does fit the data very well, but is this based on predicting held-out data, or is it just that by having more parameters it can better capture the noise? Similarly, S7 is a potentially useful figure but it's not clear what is data and what are model predictions (what are the differences between each row for each participant; are they two different models or pre-test post-test or data and model prediction?!).

      I'm not an expert on the implementation of such models but my reading of the supplemental methods is that the model is fit using all the data rather than fit and tested on held-out data. This seems problematic.

      We recognize the risk of overfitting with the causal-inference model. We now rely on Bayesian model comparisons, which use model evidence for model selection. This method automatically incorporates a penalty for model complexity through the marginalization over the parameter space (MacKay, 2003).

      Our design is not suitable for cross-validation because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out that the parameter estimates correspond to local minima.

      I would have liked to have seen more individual participant data (which is currently in the supplemental materials, albeit in a not very clear manner as discussed above).

      We have revised Supplementary Figures S4-S6 to show additional model predictions of the recalibration effect for individual participants, and participants’ temporal-order judgments are now shown in Supplement Figure S7. These figures confirm the better performance of the causal-inference model.

      The way that S3 is described in the text (line 141) makes it sound like everyone was in the same direction, however, it is clear that 2 /9 listeners show the opposite pattern, and 2 have confidence intervals close to zero (albeit on the -ve side).

      We have revised the text to clarify that the asymmetry occurs in both directions and is idiosyncratic (lines 168-171). We summarized the distribution of the individual asymmetries of the recalibration effect across visual-leading and auditory-leading adapter SOAs in Supplementary Figure S2.

      Reviewer #3 (Public Review):

      Summary:

      Li et al. describe an audiovisual temporal recalibration experiment in which participants perform baseline sessions of ternary order judgments about audiovisual stimulus pairs with various stimulus-onset asynchronies (SOAs). These are followed by adaptation at several adapting SOAs (each on a different day), followed by post-adaptation sessions to assess changes in psychometric functions. The key novelty is the formal specification and application/fit of a causal-inference model for the perception of relative timing, providing simulated predictions for the complete set of psychometric functions both pre and post-adaptation.

      Strengths:

      (1) Formal models are preferable to vague theoretical statements about a process, and prior to this work, certain accounts of temporal recalibration (specifically those that do not rely on a population code) had only qualitative theoretical statements to explain how/why the magnitude of recalibration changes non-linearly with the stimulus-onset asynchrony of the adapter.

      (2) The experiment is appropriate, the methods are well described, and the average model prediction is a fairly good match to the average data (Figure 4). Conclusions may be overstated slightly, but seem to be essentially supported by the data and modelling.

      (3) The work should be impactful. There seems a good chance that this will become the go-to modelling framework for those exploring non-population-code accounts of temporal recalibration (or comparing them with population-code accounts).

      (4) A key issue for the generality of the model, specifically in terms of recalibration asymmetries reported by other authors that are inconsistent with those reported here, is properly acknowledged in the discussion.

      Weaknesses:

      (1) The evidence for the model comes in two forms. First, two trends in the data (non-linearity and asymmetry) are illustrated, and the model is shown to be capable of delivering patterns like these. Second, the model is compared, via AIC, to three other models. However, the main comparison models are clearly not going to fit the data very well, so the fact that the new model fits better does not seem all that compelling. I would suggest that the authors consider a comparison with the atheoretical model they use to first illustrate the data (in Figure 2). This model fits all sessions but with complete freedom to move the bias around (whereas the new model constrains the way bias changes via a principled account). The atheoretical model will obviously fit better, but will have many more free parameters, so a comparison via AIC/BIC or similar should be informative

      In the revised manuscript, we switched from AIC to Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      We have addressed this comment by updating the former competitor model into a more reasonable version that induces recalibration only for some measured SOAs and by including another (asynchrony-contingent) model that is capable of predicting the nonlinearity and asymmetry of recalibration (Fig. 3) while heuristically approximating the causal inference computations. The causal-inference model outperformed the asynchrony-contingent model, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual level (Fig. S4).

      (2) It does not appear that some key comparisons have been subjected to appropriate inferential statistical tests. Specifically, lines 196-207 - presumably this is the mean (and SD or SE) change in AIC between models across the group of 9 observers. So are these differences actually significant, for example via t-test?

      We statistically compared the models using Bayes factors (Fig. 4A). The model evidence for each model was approximated using Variational Bayesian Monte Carlo. Bayes factors provided strong evidence in support of the causal-inference model relative to the other models.

      (3) The manuscript tends to gloss over the population-code account of temporal recalibration, which can already provide a quantitative account of how the magnitude of recalibration varies with adapter SOA. This could be better acknowledged, and the features a population code may struggle with (asymmetry?) are considered.

      We simulated a population-code model to examine its prediction of the recalibration effect for different adapter SOAs (lines 380–388, Supplement Section 8). The population-code model can predict the nonlinearity of recalibration, i.e., a decreasing recalibration effect as the adapter SOA increases. However, to capture the asymmetry of recalibration effects across auditory-leading and visual-leading adapter stimuli, we would need to assume that the auditory-leading and visual-leading SOAs are represented by neural populations with unequal tuning curves.

      (4) The engagement with relevant past literature seems a little thin. Firstly, papers that have applied causal inference modeling to judgments of relative timing are overlooked (see references below). There should be greater clarity regarding how the modelling here builds on or differs from these previous papers (most obviously in terms of additionally modelling the recalibration process, but other details may vary too). Secondly, there is no discussion of previous findings like that in Fujisaki et al.’s seminal work on recalibration, where the spatial overlap of the audio and visual events didn’t seem to matter (although admittedly this was an N = 2 control experiment). This kind of finding would seem relevant to a causal inference account.

      References:

      Magnotti JF, Ma WJ and Beauchamp MS (2013) Causal inference of asynchronous audiovisual speech. Front. Psychol. 4:798. doi: 10.3389/fpsyg.2013.00798

      Sato, Y. (2021). Comparing Bayesian models for simultaneity judgement with different causal assumptions. J. Math. Psychol., 102, 102521.

      We have revised the Introduction and Discussion to better situate our study within the existing literature. Specifically, we have incorporated the suggested references (lines 66–69) and provided clearer distinctions on how our modeling approach builds on or differs from previous work on causal-inference models, particularly in terms of modeling the recalibration process (lines 75–79). Additionally, we have discussed findings that might contradict the assumptions of the causal-inference model (lines 405–424).

      (5) As a minor point, the model relies on simulation, which may limit its take-up/application by others in the field.

      Upon acceptance, we will publicly share the code for all models (simulation and parameter fitting) to enable researchers to adapt and apply these models to their own data.

      (6) There is little in the way of reassurance regarding the model’s identifiability and recoverability. The authors might for example consider some parameter recovery simulations or similar.

      We conducted a model recovery for each of the six models described in the main text and confirmed that the asynchrony-contingent and causal-inference models are identifiable (Supplement Section 11). Simulations of the asynchrony-correction model were sometimes best fit by causal-inference models, because the latter behaves similarly when the prior of a common cause is set to one.

      We also conducted a parameter recovery for the winning model, the causal-inference model with modality-specific precision (Supplement Section 13).

      Key parameters, including audiovisual bias  , amount of auditory latency noise  , amount of visual latency noise  , criterion, lapse rate  showed satisfactory recovery performance. The less accurate recovery of  is likely due to a tradeoff with learning rate  .

      (7) I don't recall any statements about open science and the availability of code and data.

      Upon acceptance of the manuscript, all code (simulation and parameter fitting) and data will be made available on OSF and publicly available.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      In addition to the comments below, we would like to offer the following summary based on the discussion between reviewers:

      The major shortcoming of the work is that there should ideally be a bit more evidence to support the model, over and above a demonstration that it captures important trends and beats an account that was already known to be wrong. We suggest you:

      (1) Revise the figure legends (Figure 5 and Figure 6E).

      We revised all figures and figure legends.

      (2) Additionally report model differences in terms of BIC (which will favour the preferred model less under the current analysis);

      We now base the model comparison on Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      (3) Move to instead fitting the models multiple times in order to get leave-one-out estimates of best-fitting loglikelihood for each left-out data point (and then sum those for the comparison metric).

      Unfortunately, our design is not suitable for cross-validation methods because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out local minima.

      (4) Offering a comparison with a more convincing model (for example an atheoretical fit with free parameters for all adapters, e.g. as suggested by Reviewer 3.

      We updated the previous competitor model and included an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration (Fig. 3). The causal-inference model still outperformed the asynchrony-contingent model (Fig. 4A). Furthermore, model predictions show that only the causal-inference model captures non-zero recalibration effects for long adapter SOAs at both the group level (Fig. 4B) and individual level (Figure S4).

      Reviewer #1 (Recommendations For The Authors):

      A larger sample size would be better.

      This study used a within-subject design, which included 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data rather than on group statistics.

      It would be good to better put the study in the context of spatial ventriloquism, where similar model comparisons have been done over the last ten years and there is a large body of work to connect to.

      We now discuss our model in relation to models of cross-modal spatial recalibration in the Introduction (lines 70–78) and Discussion (lines 324–330).

      Reviewer #2 (Recommendations For The Authors):

      Previous authors (e.g. Yarrow et al.,) have described latency shift and criterion change models as providing a good fit of experimental data. Did the authors attempt a criterion shift model in addition to a shift model?

      We have considered criterion-shift variants of our atheoretical recalibration models in Supplement Section 1. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of bias or a criterion. We fit each model variant separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best captured the data of most participants.

      Figure 4B - I'm not convinced that the modality-independent uncertainty is anything but a straw man. Models not allowed to be asymmetric do not show asymmetry? (the asymmetry index is irrelevant in the fixed update model as I understand it so it is not surprising the model is identical?).

      We included the assumption that temporal uncertainty might be modality-independent for several reasons. First, there is evidence suggesting that a central mechanism governs the precision of temporal-order judgments (Hirsh & Sherrick, 1961), indicating that precision is primarily limited by a central mechanism rather than the sensory channels themselves. Second, from a modeling perspective, it was necessary to test whether an audio-visual temporal bias alone, i.e., assuming modality-independent uncertainty, could introduce asymmetry across adapter SOAs. Additionally, most previous studies implicitly assumed symmetric likelihoods, i.e., modality-independent latency noise, by fitting cumulative Gaussians to the psychometric curves derived from 2AFC-TOJ tasks (Di Luca et al., 2009; Fujisaki et al., 2004; Harrar & Harris, 2005; Keetels & Vroomen, 2007; Navarra et al., 2005; Tanaka et al., 2011; Vatakis et al., 2007, 2008; Vroomen et al., 2004).

      Why does a zero SOA adapter shift the pss towards auditory leading? Is this a consequence of the previous day’s conditioning - it’s not clear from the methods whether all listeners had the same SOA conditioning sequence across days.

      The auditory-leading recalibration effect for an adapter SOA of zero has been consistently reported in previous studies (e.g., Fujisaki et al., 2004; Vroomen et al., 2004). This effect symbolizes the asymmetry in recalibration. This asymmetry can be explained by differences across modalities in the noisiness of the latencies (Figure 5C) in combination with audiovisual temporal bias (Figure S8).

      We added details about the order of testing to the Methods section (lines 456–457).

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      “Our results indicate that human observers employ causal-inference-based percepts to recalibrate cross-modal temporal perception” Your results indicate this is plausible. However, this statement (basically repeated at the end of the intro and again in the discussion) is - in my opinion - too strong.

      We have revised the statement as suggested.

      Intro and later

      Within the wider literature on relative timing perception, the temporal order judgement (TOJ) task refers to a task with just two response options. Tasks with three response options, as employed here, are typically referred to as ternary judgments. I would suggest language consistent with the existing literature (or if not, the contrast to standard usage could be clarified).

      Ref: Ulrich, R. (1987). Threshold models of temporal-order judgments evaluated by a ternary response task. Percept. Psychophys., 42, 224-239.

      We revised the term for the task as suggested throughout the manuscript.

      Results, 2.2.2

      “However, temporal precision might not be due to the variability of arrival latency.” Indeed, although there is some recent evidence that it might be.

      Ref: Yarrow, K., Kohl, C, Segasby, T., Kaur Bansal, R., Rowe, P., & Arnold, D.H. Neural-latency noise places limits on human sensitivity to the timing of events. Cognition, 222, 105012 (2022).

      We included the reference as suggested (lines 245–248).

      Methods, 4.3.

      Should there be some information here about the order of adaptation sessions (e.g. random for each observer)?

      We added details about the order of testing to the Methods section (lines 456–457).

      Supplemental material section 1.

      Here, you test whether the changes resulting from recalibration look more like a shift of the entire psychometric function or an expansion of the psychometric function on one side (most straightforwardly compatible with a change of one decision criterion). Fine, but the way you have done this is odd, because you have introduced a further difference in the models (Gaussian vs. exponential latency noise) so that you cannot actually conclude that the trend towards a win for the bias-shift model is simply down to the bias vs. criterion difference. It could just as easily be down to the different shapes of psychometric functions that the two models can predict (with the exponential noise model permitting asymmetry in slopes). There seems to be no reason that this comparison cannot be made entirely within the exponential noise framework (by a very simple reparameterization that focuses on the two boundaries rather than the midpoint and extent of the decision window). Then, you would be focusing entirely on the question of interest. It would also equate model parameters, removing any reliance on asymptotic assumptions being met for AIC.

      We revised our exploration of atheoretical recalibration models. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of the cross-modal temporal bias or as a shift of the criterion. We fit each model separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best described the data of most participants.

      References

      Di Luca, M., Machulla, T.-K., & Ernst, M. O. (2009). Recalibration of multisensory simultaneity:

      cross-modal transfer coincides with a change in perceptual latency. Journal of Vision, 9(12), Article 7.

      Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. ’ya. (2004). Recalibration of audiovisual simultaneity. Nature Neuroscience, 7(7), 773–778.

      Harrar, V., & Harris, L. R. (2005). Simultaneity constancy: detecting events with touch and vision. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 166(3-4), 465–473.

      Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sense modalities. Journal of Experimental Psychology, 62(5), 423–432.

      Keetels, M., & Vroomen, J. (2007). No effect of auditory-visual spatial disparity on temporal recalibration. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 182(4), 559–565.

      MacKay, D. J. (2003). Information theory, inference and learning algorithms.https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=201b835c3f3a3626ca07b e68cc28cf7d286bf8d5

      Navarra, J., Vatakis, A., Zampini, M., Soto-Faraco, S., Humphreys, W., & Spence, C. (2005). Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Brain Research. Cognitive Brain Research, 25(2), 499–507.

      Tanaka, A., Asakawa, K., & Imai, H. (2011). The change in perceptual synchrony between auditory and visual speech after exposure to asynchronous speech. Neuroreport, 22(14), 684–688.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2007). Temporal recalibration during asynchronous audiovisual speech perception. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 181(1), 173–181.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2008). Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 185(3), 521–529.

      Vroomen, J., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Recalibration of temporal order perception by exposure to audio-visual asynchrony. Brain Research. Cognitive Brain Research, 22(1), 32–35.

    1. eLife Assessment

      This valuable study presents the design of a new device for using high-density electrophysiological probes ('Neuropixels') in freely moving rodents. The evidence demonstrating the system's versatility and ability to record high-quality extracellular data in both mice and rats is compelling. This study will be of significant interest to neuroscientists performing chronic electrophysiological recordings.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels as well as the technical details on how the electrodes can be explanted for a follow up reuse is provided. I think the description of all parts of the method are very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies or neurophysiological mechanisms across temporal scales that were previously inaccessible with high temporal resolution (i.e. not with imaging).

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving). The implant offers a major advance compared to previous methods and that will help the community generate richer datasets.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public review):

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixels recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:<br /> - The implants have been successfully tested across 8 different laboratories, in mice and rats, in head-fixed and freely moving conditions and have been adapted in multiple ways for a number of distinct experiments.<br /> - Implants are easily customizable and authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.<br /> - The authors provide clear and straightforward descriptions of the construction, implantation and explant of the described implants.<br /> - The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.<br /> - The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.<br /> - The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.<br /> - The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (both mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances may be better served by 2 implants.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels, as well as the technical details on how the electrodes can be explanted for follow-up reuse, is provided. I think the description of all parts of the method is very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies, or neurophysiological mechanisms across temporal scales.

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving).

      Weaknesses:

      Weak emphasis on what can be enabled with this new method that didn't exist before.

      We thank the reviewer for highlighting the limited discussion around scientific impact. Our implant has several advantages which combine to make it much more accessible than previous solutions. This enables a variety of recording configurations that would not have been possible with previous designs, facilitating recordings from a wider range of brain regions, animals, and experimental setups. In short, there are three key advances which we now emphasise in the manuscript:

      Adaptability: The CAD files can be readily adapted to a wide range of configurations (implantation depth, angle, position of headstage, etc.). Labs have already modified the design for their needs, and re-shared with the community (Discussion, Para 5).

      Weight: Because of the lightweight design, experimenters can i) perform complex and demanding freely moving tasks as we exemplify in the manuscript, and ii) implant female and water restricted mice while respecting animal welfare weight limitations (Flexible design, Para 1).

      Cost: At ~$10, our implant is significantly cheaper than published alternatives, which makes it affordable to more labs and means that testing modifications is cost-effective (Discussion, Para 4).

      Reviewer 1 (Recommendations For The Authors):

      - Differences between mice and rats seem very significant. Although this is probably not surprising, I suggest that the authors comment on this to make it clear to anyone trying to use in different species that are not quantified in the main figures.

      The reviewer is correct—there are qualitative differences between mice and rats, particularly with respect to the unit median amplitude. We have added a comment in the discussion to highlight these inter-species variations (Discussion, Para 7)

      - Another comment that would be useful to have would be how to tackle the problem of tracking the same neuron across days. Even if currently impossible, it could be useful to provide discussion along those lines as to where future improvements (either in hardware or software) can be made.

      We thank the reviewer for highlighting this. Figure. 5 does show data from tracking the same neuron across days (and even months). We have modified the language to make this clear.

      Reviewer 2 (Public Review):  

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes, and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixel recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:

      The implants have been successfully tested across 8 different laboratories, in mice and rats, in headfixed and freely moving conditions, and have been adapted in multiple ways for a number of distinct experiments.

      Implants are easily customizable and the authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.

      The authors provide clear and straightforward descriptions of the construction, implantation, and explant of the described implants.

      The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.

      The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.

      The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.

      The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

      Weaknesses:

      While implanted animals can still perform complex behavioral tasks, the authors describe that the implants may reduce the animals' mobility, as measured by prolonged reaction times. However, the presented data does not allow us to judge whether this effect is specifically due to the presented implant or whether any implant or just tethering of the animals per se would have the same effects.

      The reviewer is correct: some of the differences in mouse reaction time could be due to the tether rather than the implant. As these experiments were also performed in water-restricted female mice with the heavier Neuropixels 1.0 implant, our data represent the maximal impact of the implant, and we have highlighted this point in the revision (Freely behaving animals, Para 2).  

      While the authors make certain comparisons to other, previously published approaches for chronic implantation and re-use of Neuropixels probes, it is hard to make conclusive comparisons and judge the advantages of the current implant. For example, while the authors emphasize that the lower weight of their implant allows them to perform recordings in mice (and is surely advantageous), the previously described, heavier implants they mention (Steinmetz et al., 2021; van Daal et al., 2021), have also been used in mice. Whether the weight difference makes a difference in practice therefore remains somewhat unclear.

      The reviewer is correct: without a direct comparison, we cannot be certain that our smaller, lighter implant improves behavioural results (although this is supported by the literature, e.g. Newman et al, 2023). However, the reduced weight of our implant is critical for several laboratories represented in this manuscript due to animal welfare requirements. Indeed, in van Daal et al the authors “recommend a [mouse] weight of >25 g for implanting Neuropixels 1.0 probes.” This limit precludes using (the vast majority of) female mice, or water-restricted animals. Conversely, our implant can be routinely used with lighter, water-restricted male and female mice. We emphasised this point in the revision (Discussion, Para 2).

      The non-permanent integration of the headstages into the implant, while allowing for the use of the same headstage for multiple animals in parallel, requires repeated connections and does not provide strong protection for the implant. This may especially be an issue for the use in rats, requiring additional protective components as in the presented rat experiments.

      We apologise for not clarifying the various headstage holder options in the manuscript and we have now addressed this in the revision (Freely behaving animals, Para 1&2). Our repository has headstage holder designs (in the XtraModifications/Mouse_FreelyMoving folder). This allows leaving the headstage on the implant, and thus minimize the number of connections (albeit increasing the weight for the mouse). Indeed, mice recorded while performing the task described in our manuscript had the head-stage semi-permanently integrated to the implant, and we now highlight this in the revision (Freely behaving animals, Para 1).

      Reviewer 2 (Recommendations For The Authors): 

      The description of the different versions of the head-stage holders should be more clear, listing also advantages/disadvantages of the different solutions. It would be also useful if the authors could comment on the use of these head-stage holders in rats, since they do not seem to offer much protection.

      We thank the reviewer for this point, and we have added notes to the manuscript to clarify the various advantages of the different headstage-holders, and that the headstage can be permanently attached to the implant (Freely behaving animals, Para 1&2). This is the primary advantage of these solutions compared with the minimal implant—at the expense of increasing the implant weight.  

      The reviewer’s concerns regarding the lack of protection for implants in rats is well-placed, and we now emphasise that these experiments benefited from the additional protection of an external 3D casing, which is likely critical for use in larger animals (Freely behaving animals, Para 1).

      While re-used probes seem to show similar yields across multiple uses (Figure 4C), it seems as if there is a much higher variability of the yield for probes that are used for the first (maybe also second) time. There are probes with much higher than average yields, but it seems none of the re-used probes show such high yields. Is this a real effect? Is this because the high-yield probes happened to have not been used multiple times? Is there an analysis the authors could provide to reduce the concern that yields may generally be lower for re-used probes/that there are no very high yields for re-used probes?

      We understand the reviewer’s concern with respect to Figure 4C, however, the re-use of any given probe was determined only by the experimental needs of the project. It is therefore not possible that there is a relationship between probes selected for re-use and unit-yield. We now specify this in the revised legend of Figure 4C. This variability (and the consistency in yield across uses) likely stems from differences between labs, brain region, and implantation protocol.

      The authors claim that a 'large fraction' of units could be tracked for the entire duration of the experiment (Figure 5A,B). They mention in the discussion that quantification can be found in a different manuscript (van Beest et al., 2023), but this should also be quantified here in at least some more detail, also for other animals in addition to the one mouse which was recorded for ~100 days. What fraction can be held for different durations? What is the average holding time, etc.?

      We agree with the reviewer, and have now added new panels quantifying the probability and reliability of tracking a neuron across days (Figure 5E-F). We also comment on the change in tracking probability across time, and its variability across recordings (Stability, Para 4).

      Reviewer 3 (Public Reviews):

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances are not possible in their current form (distance between probes 1.8 to 4mm, implantation depth 2-6.5 mm, or angle of insertion up to 20 degrees).

      As we now discuss in the manuscript (Discussion, Para 4), one implant accommodating the diversity of the existing probes is beyond the scope of this project. However, because the design is adaptable, groups should be able to modify the current version of the implant to adapt to their electrodes’ size and format (and can highlight any issues in the Github “Discussions” area).

      With Neuropixels, the current range of depths covers practically all trajectories in the mouse brain. In rats, where deeper penetrations may be useful, the experimenter can attach the probe at a lower point in the payload module to expose more of the shank. We now specify this in the Github repository.  

      We have now extended the range of inter-probe distances from a maximum of 4 mm to 6.5 mm. Distances beyond this may be better served by 2 implants, and smaller distances could be achieved by attaching two probes on the same side of the docking module. These points are now specified in the revised manuscript (Flexible design, Para 2).

      Reviewer 3 (Recommendations For The Authors):

      I have only a few questions and suggestions:

      Is it possible to create step-by-step instructions for explantation (similar to Figure-1 with CAD schematics)? You mention that payload holder is attached to a micromanipulator, but it is unclear how this is achieved. How was the payload secured with a screw (which screw)? My understanding is that as you turn the screw in the payload holder, it will grab onto the payload module from both sides, but the screw is not in contact with the payload module, correct? I found the screw type on your GitHub, but it would be great if you could add a bill of materials in a table format, so readers don't have to jump between GitHub and article.

      We have now added a bill of materials to the revised manuscript (Implant design and materials, Para 2), although up-to-date links are still provided on the Github repository due to changing availability.

      What happens if you do a dual probe implant and cannot avoid blood vessels in one or both of the craniotomies due to the pre-defined geometry? Is this a frequent issue? How can you overcome this during the surgery?

      Blood vessels can be difficult to avoid in some cases, but we are typically able to rotate/reposition the probes to solve this issue. In some cases, with 4-shank probes, the blood vessel can be positioned between individual probe shanks. We now detail this in the revised manuscript (Assembly and implantation, Para 3).

      I assume if the head is not aligned (line-332) the probe can break during recovery. Have you experienced this during explanation?

      As we now specify in the manuscript (Explantation, Para 2), we are careful when explanting the probe to avoid this issue, and due to the flexibility of the shanks, it does not appear to be a major concern.

      Why did you remove the UV glue (line 435)? How can you level the skull? I assume you have covered bregma and lambda in the first surgery which can create an uneven surface to measure even after you remove the UV glue.

      We thank the reviewer for highlighting this omission from the methods. We now explain (Implantation, Carandini-Harris laboratory) that the UV-glue is completely removed during the second surgery, and the skull is cleaned and scored. This improves the adhesion of the dental cement, and allows for reliable levelling of the skull.

      In line 112 you mentioned that the number of recorded neurons was stable; however, you found a 3% mean decrease in unit count per day (line 120). Stability is great until day 10 (in Figure 4A), but it deteriorates quickly after that. I think it would help readers if you could add the mean{plus minus}SEM of recorded units in the text for days 1-10, days 11-50, and days 51-100 (using the data from Figure 4A).

      We have now added Supplementary Figure 4 to show unit count across bins of days, and a corresponding comment in the text (Stability, Para 2).

      A full survey of the probe (Figure 4B) means that you recorded neuronal activity across 4-5000 channels (depending on how many channels were in the brain). While it is clear that a full probe survey can reduce the number of animals needed for a study, it is also clear in this figure that by day 25 you can record ~300 neurons on 4000 channels. It would be great to discuss this in the discussion and give a balanced view of the long-term stability of these recordings.

      Overall, keeping a large number of units for a long time still remains a challenge. Here, we could record on average 85 neurons per bank during the first 10 days, and then only 45 after 50 days. It is important to note that our quantification averages across all banks recorded, including those in a ventricle or partly outside of the brain. Thus, our results represent a lower estimate of the total neurons recorded. Our new Supplementary Figure 4 helps to highlight the diversity of neuron number recorded per animal. Further improvements in surgical techniques and spike sorting will likely improve stability further and we have now added this comment in the manuscript (Stability, Para 2). For example, we observed excellent stability in a mouse where the craniotomy was stabilized with KwikSil (Supplementary Figure 5).

      The RMS value was around 20 uV in some of the recordings, and according to Figure 4G it is around 16 uV on average. Is it safe to accept putative single units with 20 uV amplitudes, when the baseline noise level is this close to the spike peak-to-peak amplitude?

      On average, less than 1% of the units selected using all the other metrics except the amplitude had an amplitude below 30 µV, and 2.6% below 50 µV. Increasing the threshold to 30 µV, or even 50 µV, did not affect the results. We have now added this comment in the Methods (Data processing, Para 3).

      Can you add the waveform and ISIH of the example unit from day 106 to Figure 5?

      We have now added 4 units tracked up to day 106 in Figure 5.  

      Could you move Supplementary Figure 3A to Figure 4? The number of units is more valuable information than the RMS noise level. I understand that you don't have such a nice coverage of all the days as in Figure 3 and 4, but you might be able to group for the first 3 days and the last 3 days (and if data is available, the middle 3 days) as a boxplot. The goal would be for the reader to be able to see whether there is any change in the number of single units over time.

      We agree with the reviewer, the number of units is more valuable. We had included this information in Figure 4A-F, but we have made edits to the text to make it clearer that this is what is being shown. The data from Figure 3A is already contained within Figure 4, but in 3A the data is separated by individual labs.

      Product numbers are missing in multiple places: line-285 (screw), line-288 (screw), line-290 (screw), line-309 (manipulator), line-374 (gold pin and silver wire), line-384 (Mill-Max), line-394 (silver wire), and many more. It would be great if you could add all these details, so people can replicate your protocol.

      We thank the reviewer for highlighting this, and we have added details of screw thread-size and length to relevant parts of the manuscript, although any type of screw can be used. Similarly, other components are non-specific (e.g. multiple silver-wire diameters were used across labs), so we have not included specific product numbers for general consumer items (like screws and silver wires) to avoid indicating that a specific part must be purchased.

      While it is great to see lab-specific methods, I am not sure in their current form it helps to understand the protocol better. The information is conveyed in different ways (I assume these were written by different people), in different orders, and in different depths (some mention probe implant location relative to bregma and midline, some don't). There are many different glues, epoxies, cement, wires, and pins. I would recommend rewriting these methods sections under a unified template, so it is easier to follow.

      We thank the reviewer for this suggestion and we have rewritten this section of the methods accordingly. We now use a template structure to simplify the comparisons between labs: the same template is used for each lab in each section (payload module assembly, implantation, and data acquisition).

      Line-307: why is a skull screw optional for grounding? What did you use for ground and reference if not a ground screw?

      We now specify in the manuscript that during head-fixed experiments, the animal’s headplate can be used for grounding, and combined with internal referencing provided by the Neuropixels, yielded lownoise recordings (Implantation protocol, Methods).

    1. eLife Assessment

      The manuscript provides important new insights into the mechanisms of statistical learning in early human development, showing that statistical learning in neonates occurs robustly and is not limited to linguistic features but occurs across different domains. The evidence is convincing and the findings are highly relevant for researchers working in several domains, including developmental cognitive neuroscience, developmental psychology, linguistics, and speech pathology.

    2. Reviewer #1 (Public review):

      Summary:

      Parsing speech into meaningful linguistic units is a fundamental yet challenging task that infants face while acquiring the native language. Computing transitional probabilities (TPs) between syllables is a segmentation cue well-attested since birth. In this research, the authors examine whether newborns compute TPs over any available speech feature (linguistic and non-linguistic), or whether by contrast newborns favor computation of TPs over linguistic content over non-linguistic speech features such as speaker voice. Using EEG and the artificial language learning paradigm, they record the neural responses of two groups of newborns presented with speech streams in which either phonetic content or speaker voice are structured to provide TPs informative of word boundaries, while the other dimension provides uninformative information. They compare newborns' neural responses to these structured streams to their processing of a stream in which both dimensions vary randomly. After the random and structured familiarization streams, the newborns are presented with (pseudo)words as defined by their informative TPs, as well as partwords (that is, sequences that straddle a word boundary), extracted from the same streams. Analysis of the neural responses show that while newborns neural activity entrained to the syllabic rate (2 Hz) when listening to the random and structured streams, it additionally entrained at the word rate (4 Hz) only when listening to the structured streams, finding no differential response between the streams structured around voice or phonetic information. Newborns showed also different neural activity in response to the words and part words. In sum, the study reveals that newborns compute TPs over linguistic and non-linguistic features of speech, these are calculated independently, and linguistic features do not lead to a processing advantage.

      Strengths:

      This interesting research furthers our knowledge of the scope of the statistical learning mechanism, which is confirmed to be a general-purpose powerful tool that allows humans to extract patterns of co-occurring events while revealing no apparent preferential processing for linguistic features. To answer its question, the study combines a highly replicated and well-established paradigm, i.e. the use of an artificial language in which pseudowords are concatenated to yield informative TPs to word boundaries, with a state-of-the-art EEG analysis, i.e. neural entrainment. The sample size of the groups is sufficient to ensure power, and the design and analysis are solid and have been successfully employed before.

      Weaknesses:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, future studies should pit both dimensions against each other, to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      To sum up, the authors achieved their central aim of determining whether TPs are computed over both linguistic and non-linguistic features, and their conclusions are supported by the results. This research is important for researchers working on language and cognitive development, and language processing, as well as for those working on cross-species comparative approaches.

      Comments on revisions:

      The authors have addressed my suggestions. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript investigates to what degree neonates show evidence for statistical learning from regularities in streams of syllables, either with respect to phonemes or with respect to speaker identity. Using EEG, the authors found evidence for both, stronger entrainment to regularities as well as ERP differences in response to violations of previously introduced regularities. In addition, violations of phoneme regularities elicited an ERP pattern which the authors argue might index a precursor of the N400 response in older children and adults.

      Strengths:

      All in all, this is a very convincing paper, which uses a clever manipulation of syllable streams to target the processing of different features. The combination of neural entrainment and ERP analysis allows for the assessment of different processing stages, and implementing this paradigm in a comparably large sample of neonates is impressive.

      Weaknesses

      The authors addressed all the concerns I previously raised well and I have no further comments.

    4. Reviewer #3 (Public review):

      Summary:

      This study is focused on testing whether statistical learning (a mechanism for parsing the speech signal into smaller chunks) preferentially operates over certain features of the speech at birth in humans. The features under investigation are phonetic content and speaker identity. Newborns are tested in an EEG paradigm in which they are exposed to a long stream of syllables. In Experiment 1, newborns are familiarized with a sound stream that comprises regularities (transitional probabilities) over syllables (e.g., "pe" followed by "tu" in "petu" with 1.0 probability) while the voices uttering the syllables remain random. In Experiment 2, newborns are familiarized with the same sound stream but, this time, the regularities are built over voices (e.g., "green voice" followed by "red voice" with 1.0 probability) while the concatenation of syllables stays random. At the test, all newborns listened to duplets (individual chunks) that either matched or violated the structure of the familiarization. In both experiments, newborns showed neural entrainment to the regularities implemented in the stream, but only the duplets defined by transitional probabilities over syllables (aka word forms) elicited a N400 ERP component. These results suggest that statistical learning operates in parallel and independently on different dimensions of the speech already at birth and that there seems to be an advantage for processing statistics defining word forms rather than voice patterns.

      Strengths:

      This paper presents an original experimental design that combines two types of statistical regularities in a speech input. The design is robust and appropriate for EEG with newborns. I appreciated the clarity of the Methods section. There is also a behavioral experiment with adults that acts like a control study for newborns. The research question is interesting, and the results add new information about how statistical learning works at the beginning of postnatal life, and on which features of the speech. The figures are clear and helpful in understanding the methods, especially the stimuli and how the regularities were implemented.

      Weaknesses:

      I appreciated how the authors addressed my previous comments and concerns. I am satisfied with the changes made by the authors. I believe the paper reads much better. Also, the adjustment to the theoretical framework suits well.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the three reviewers for their positive comments and useful suggestions. We have implemented most of the reviewers’ recommendations and hope the manuscript is clearer now.

      The main modifications are:

      - A revision of the introduction to better explain what Transitional Probabilities are and clarify the rationale of the experimental design

      - A revision of the discussion

      - To tune down and better explain the interpretation of the different responses between duplets after a stream with phonetic or voice regularities (possibly an N400).

      - To better clarify the framing of statistical learning as a universal learning mechanism that might share computational principles across features (or domains).

      Below, we provide detailed answers to each reviewer's point.

      Response to Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language.

      This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      We added one sentence in the discussion stating that more research is needed to understand whether infants can track both regularities simultaneously (p.13, l.270 “Future work could explore whether they can simultaneously track multiple regularities.”).

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Response to Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a duplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes.

      We revised the abstract (p.2, l.33) and the discussion of this result (p.15, l.299), toning them down. We hope the rationale of the interpretation is clearer now, as is the fact that it is just one possible interpretation of the results.

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.

      We report these analyses in SI and referred to them in the methods section (p.25, l.468 “We performed post-hoc tests to ensure that the results were not driven by a perception of two voices: female and male (see SI).”).

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Part- words in List B might be attributed to gender alternation.

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.

      Author response image 2:

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words,

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Figure 4 for the location of electrodes in an infant head model).

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      We added a phrase in the discussion to explain why we can expect phase-locked activity in posterior electrodes (p.14, l.277: “Auditory ERPs, after reference-averaged, typically consist of a central positivity and posterior negativity”).

      Author response image 4:

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Response to Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.

      We have rephrased the introduction to make this point clearer. See p.5, l.88-92: “To test this, we have taken advantage of the fact that syllables convey two important pieces of information for humans: what is being said and who is speaking, i.e. linguistic content and speaker’s identity. While statistical learning…”.

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation).

      We have revised the discussion to clarify this theoretical framework.

      In p.13, l.264: “This mechanism might be rooted in associative learning processes relying on the co- existence of event representations driven by slow activation decays (Benjamin et al., 2024). ”

      In p., l. 364: “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech. This supports the idea that statistical learning is a general learning mechanism, probably operating on common computational principles across neural networks (Benjamin et al., 2024)…”.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it.

      We have removed the sentence “Statistical learning is an evolutionary ancient learning mechanism.”, and replaced it by (p.18, l.364) “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech.” We now emphasise in the discussion that infants compute regularities on both features and propose that SL might be a universal learning mechanism sharing computational principles (Benjamin et al., 2024) (see point 2).

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We have revised the description of the stimuli and the legend of Figure 1 to clarify these important points.

      See p.6, l. 113: “The structure consisted of the random concatenation of three duplets (i.e., two-syllable units) defined only by one of the two dimensions. For example, in Experiment 1, one duplet could be petu with each syllable uttered by a random voice each time they appear in the stream (e.g pe is produced by voice1 and tu by voice6 in one instance and in another instance pe is produced by voice3 and tu by

      voice2). In contrast, in Experiment 2, one duplet could be the combination [voice1- voice6], each uttering randomly any of the syllables.”

      p.20, l. 390 (Figure 1 legend): “For example, the two syllables of the word “petu” were produced by different voices, which randomly changed at each presentation of the word (e.g. “yellow” voice and “green” voice for the first instance, “blue” and “purple” voice for the second instance, etc..). In Experiment 2, the statistical structure was based on voices (TPs alternated between 1 and 0.5), while the syllables changed randomly (uniform TPs of 0.2). For example, the “green” voice was always followed by the “red” voice, but they were randomly saying different syllables “boda” in the first instance, “tupe” in the second instance, etc... “

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We have modified this sentence in the manuscript to make it clearer.

      See p.7, l. 120: “If infants at birth compute regularities based on a neural representation of the syllable as a whole, i.e. comprising both phonetic and voice content, this would require computing a 36 × 36 TPs matrix relating each token.”

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym TP should be spelled out, and a brief description of the fact that dips in TPs signal boundaries while high TPs signal a cohesive unit could be useful for non-specialist readers.

      We have added it at the beginning of the introduction (lines 52-60)

      (2) p.5, l.76: "Here, we aimed to further characterise the characteristics of this mechanism...". I suggest this is rephrased as "to further characterise this mechanism".

      We have changed it as suggested by the reviewer (now p.5, l.81)

      (3) p.9, l.172: "[...] this contribution is unlikely since the electrodes differ from the electrodes, showing enhanced word-rate activity at 2 Hz."

      It is unclear which electrodes differ from which electrodes. I figure that the authors mean that the electrodes showing stronger activity at 2 Hz differ from those showing it at 4 Hz, but the sentence could use rephrasing.

      This part has been rephrased (p.9, l.177-181)

      (4) p.10, l.182: "[...] the entrainment during the first minute of the structure stream [… ]".

      Structured stream.

      It has been corrected (p.10, l.190)

      (5) p.12, l.234: "we compared STATISTICAL LEARNING"

      Why the use of capitals?

      This was an error and it was corrected (p.12, l.242).

      (6) p.15, l.298: "[...] suggesting that such negativity might be related to semantic."

      The sentence feels incomplete. To semantics? To the processing of semantic information?

      The phrase has been corrected (p.15, l.314). Additionally, the discussion of the posterior negativity observed for duplets after familiarisation with a stream with regularities over phonemes has been rephrased (p.15, l.)

      (7) Same page, l.301: "3-mo-olds" 3-month-olds.

      It has been corrected (now in p.16, l.333)

      (8) Same page, l.307: "(see also (Bergelson and Aslin, 2017)" (see also Bergelson and Aslin, 2017).

      It has been corrected (now in p.17, l.340)

      (9) Same page, l.310: "[...] would be considered as possible candidate" As possible candidates.

      This has been rephrased and corrected (now in p.17, l.343)

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2: The authors mention a "thick orange line", which I think should be a "thick black line".

      We are sorry for this. It has been corrected.

      (2) Ln 166: Should be Figure 2C rather than 3C.

      It has been corrected (now in p.9, l.173)

      (3) Figure 4 is not referenced in the manuscript.

      We referred to it now on p. 12, l.236

    1. eLife Assessment

      This study presents a valuable finding on how the interplay between transcription factors SOX2 and OCT4 establishes the pluripotency network in early mouse embryos. The evidence supporting the claims of the authors is solid, although inclusion of additional omics data would further strengthen the study. The work will be of interest to biologists working on embryonic development and gene regulation.

    2. Reviewer #1 (Public review):

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data.

      Weaknesses:

      Major Points:<br /> (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?<br /> (2) Is ZP 3-Cre expressed in the zygotes? Is there any residual protein?<br /> (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?<br /> (4) The ordinate of Fig4c is lost.<br /> (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.<br /> (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfering with both genes have on gene expression and chromatin accessibility?

      Comments on revisions:

      The authors have addressed my concerns so I am fine with revision in principle.

    3. Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Comments on revisions:

      The authors have addressed many of the concerns raised in the initial review and provided alternative analytical approaches to address the relevant questions in this revision. Some of these are useful; however, they have not fully addressed one critical point.<br /> In my original critique, I noted that the maternal KO might not be suitable as a control, given that there is no significant phenotypic difference between the maternal-only KO and the maternal-zygotic KO. While we did not dispute the molecular differences presented in Figure 2, so how the authors conclude in the Response "embryos with a maternal KO or zygotic heterozygous KO of Oct4 or Sox2 show no noticeable ... molecular difference (Figure 2-figure supplement 4A)"? The authors should recheck whether this is a typographical error or a valid statement.

      Additionally, I recommend the removal of phrases such as "absolutely priority" and "pivotal" throughout the manuscript, as these terms are overly assertive without sufficient supporting evidence.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews: 

      Reviewer #1 Comments on revisions: 

      The authors have addressed my concerns so I am fine with revision in principle.

      Thank you for taking the time to review our work and for your thoughtful feedback. We’re glad to hear that your concerns have been addressed.

      Reviewer #2 Comments on revisions:

      The authors have addressed many of the concerns raised in the initial review and provided alternative analytical approaches to address the relevant questions in this revision. Some of these are useful; however, they have not fully addressed one critical point. 

      In my original critique, I noted that the maternal KO might not be suitable as a control, given that there is no significant phenotypic difference between the maternal-only KO and the maternal-zygotic KO. While we did not dispute the molecular differences presented in Figure 2, so how the authors conclude in the Response "embryos with a maternal KO or zygotic heterozygous KO of Oct4 or Sox2 show no noticeable ... molecular difference (Figure 2-figure supplement 4A)"? The authors should recheck whether this is a typographical error or a valid statement. 

      Additionally, I recommend the removal of phrases such as "absolutely priority" and "pivotal" throughout the manuscript, as these terms are overly assertive without sufficient supporting evidence.

      We sincerely appreciate the reviewer’s feedback and would like to take this opportunity to provide further clarification, as there might have been a misunderstanding.

      We respectfully disagree with the reviewer’s statement that “there is no significant phenotypic difference between the maternal-only KO and the maternal-zygotic KO.” Based on privious publications, there is clear evidence that maternal-zygotic KO embryos exhibit significant defects: they fail to form a healthy primitive endoderm, are unable to give rise to embryonic stem cells (ESCs) in vitro, and die shortly after implantation (Frum et al., Dev Cell 2013; Wu et al., Nat Cell Biol 2013; Le Bin et al., Development 2014; Wicklow et al., PLoS Genet 2014). In contrast, maternal-only KO embryos develop as healthy as wild-type (WT) embryos and do not display any of these phenotypic abnormalities. We believe that this distinction validates our use of maternal KO embryos as proper controls in our experiments. 

      To address the reviewer’s concerns and ensure clarity, we have also revised the following statement in the manuscript.

      Original manuscript: “Mouse embryos with a maternal KO or zygotic heterozygous KO of either factor show no noticeable phenotype or molecular difference (Figure 2-figure supplement 4A) (Avilion et al., 2003; Frum et al., 2013; Kehler et al, 2004; Nichols et al., 1998; Wicklow et al., 2014; Wu et al., 2013).” 

      Revised manuscript: “Maternal KO embryos (circles in Figure 2—figure supplement 4A) clustered together with wildtype embryos (triangles and squares) in the PCA analysis, consistent with previous studies reporting no observable phenotype in maternal KO embryos (Avilion et al., 2003; Frum et al., 2013; Kehler et al, 2004; Nichols et al., 1998; Wicklow et al., 2014; Wu et al., 2013).”

      While we acknowledge the potential for using maternal-only KO controls to underestimate differences between control and KO samples, we believe this approach does not introduce false positives in our RNA-seq and ATAC-seq experiments, only the possibility of more conservative conclusions. This minimizes the risk of overestimating the molecular impact.

      We appreciate the reviewer’s recommendation regarding the use of overly assertive terms. Upon careful review of the manuscript and response letter, we could not find instances of the term “absolutely priority.” However, we do use the term “pivotal” and would prefer to retain it as we believe it accurately reflects the importance of the findings presented in our manuscript.

      Thank you for your thoughtful comments and suggestions! We hope this response clarifies our rationale and addresses the concerns.

      ---

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      Thank you for helping clarify this. We will add a detailed description of the knockout strategy in the legends for Figure 1A and 1B, as shown below. Note that the same strategy was used by Nichols et al (Cell, 1998).

      Figure 1A. Schemes of mKO2-labeled Oct4 KO (Oct4<sup>mKO2</sup>) and Oct4<sup>flox</sup> alleles. In the Oct4<sup>mKO2</sup> allele, a PGK-pac∆tk-P2A-mKO2-pA cassette was inserted 3.6 kb upstream of the Oct4 transcription start site (TSS) and a promoter-less FRT-SA-IRES-hph-P2A-Venus-pA cassette was inserted into Oct4 intron 1. The inclusion of a stop codon followed by three sets of polyadenylation signal sequences (pA) after the Venus cassette ensures both transcriptional and translational termination, effectively blocking the expression of Oct4 exons 2–5.

      Figure 1B. Schemes of EGFP-labeled Sox2 KO (Sox2<sup>EGFP</sup>) and Sox2 <sup>flox</sup> alleles. In the Sox2 Sox2<sup>EGFP</sup> allele, the 5’ untranslated region (UTR), coding sequence and a portion of the 3’ UTR of Sox2 were deleted and replaced with a PGK-EGFP-pA cassette. Notably, 1,023 bp of the Sox2 3’UTR remain intact.

      (2) Is ZP3-Cre expressed in the zygotes? Is there any residual protein?

      This is indeed a very important issue. Here is why we think we are on the safe side. ZP3 is specifically expressed in growing oocytes, thus making ZP3-Cre a widely used tool for deleting maternally inherited alleles. When we crossed Oct4<sup>flox/flox</sup>; ZP3-Cre<sup>-</sup>_females with _Oct4<sup>flox/flox</sup>; ZP3-Cre<sup>+</sup> males, we got ZP3-Cre<sup>+</sup> Oct4<sup>flox/flox</sup> but no Oct4<sup> flox/∆</sup> or Oct4<sup> ∆/∆</sup> pups, suggesting that the paternally inherited ZP3-Cre allele is not functionally active in zygotes, which is consistent with reports from other researchers (e.g. Frum, et al., Dev Cell 2013; Wu, et al., Nat Cell Biol 2013).

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      The enriched motifs in the rising ATAC-seq peaking in Oct4 KO and Sox2 KO ICMs are the GATA, TEAD, EOMES and KLF motifs, as shown in Figure 4A and Figure supplement 7.

      (4) The ordinate of Fig4c is lost.

      Thank you for pointing this out. The y-axis is average normalized signals (reads per million-normalized pileup signals). We will add it in the revised version.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      Thank you for this insightful comment. We analyzed the published H3K27ac ChIP-seq data of mouse ICM at 94-96 h post hCG (B. Liu, et al., Nat Cell Biol 2024) to assess the enrichment of H3K27ac around our ATAC-seq peaks. Unfortunately, the data quality is poor, e.g., inconsistent across replicates (Author response image 1A), and shows little enrichment around the well-defined enhancers (Author response image 1B). Nevertheless, as we admit not all the distal ATAC-seq peaks or open chromatin regions are enhancers, we have replaced “enhancers” with “open chromatin regions”, “ATAC-seq peaks” or “putative enhancers”.

      Author response image 1.

      Analysis of the published H3K27ac ChIP-seq dataset of mouse ICM at 94-96 h post hCG (B. Liu, et al., Nat Cell Biol 2024). A. ChIP-seq profiles of H3K27ac over the decreased, unchanged and increased ATAC-seq peaks in our Oct4-KO late ICMs. To exclude spurious peaks, only strong unchanged peaks (57,512 out of 142,096) were used in the analysis. B. IGV tracks displaying ATAC-seq and H3K27ac ChIP-seq profiles around Dppa3 and Oct4. Red boxes mark the known OCT-SOX enhancers.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfere with both genes have on gene expression and chromatin accessibility?

      This is indeed an interesting question. Unfortunately, we have not conducted this specific experiment, so we do not have direct results. However, Sap30 is a key component of the mSin3A corepressor complex, while Uhrf1 regulates the establishment and maintenance of DNA methylation. Both proteins are known to function as repressors. Therefore, we hypothesize that interfering with these two genes could alleviate repression of some genes, such as trophectoderm markers, similar to what we have observed in Oct4 KO and Sox2 KO ICMs.

      Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison. (a complementary comment from the reviewer: “Both maternal KO and maternal-zygotic KO in this study exhibited phenotypic consistency but molecular disparity. Specifically, both KO and control groups could develop normally; however, their chromatin landscapes and transcriptomic profiles showed different. This raises the question of whether the molecular differences are real. We suggest that inclusion of a completely wild-type control group would make the comparison more robust.”)

      Thank you for your feedback as this point was obviously not clear in the manuscript. Here is our explanation: Mouse embryos with a maternal KO or zygotic heterozygous KO of Oct4 or Sox2 show no noticeable phenotype or molecular difference (Figure 2-figure supplement 4A) (Avilion et al., 2003; Frum et al., 2013; Kehler et al, 2004; Nichols et al., 1998; Wicklow et al., 2014; Wu et al., 2013). We have clarified this point in the revised manuscript.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      Thank you for this insightful and helpful comment. As stated in the response to Reviewer #1’s point (5), we have replaced “enhancers” with “open chromatin regions”, “ATAC-seq peaks” or “putative enhancers”.

      The definition of "close to" or "near" in lines 183-184 is in the legend of Figure 2E and Methods. In the GSEA analysis, Ensembl protein-coding genes with TSSs located within 10 kb of ATAC-seq peak centers were included, so that some of the intronic ATAC-seq peaks were taken into consideration. We have also added the information in the main text of the revised manuscript.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      Yes indeed, you have a point. In Figure Supplement 3C, we analyzed the published Sox2 CUT&RUN data from E4.5 ICMs (Li et al., Science, 2023), which demonstrates that the reduced ATAC-seq peaks in our Sox2 KO ICMs are enriched with the Sox2 CUT&RUN signals. Unfortunately, we did not to find similar published data for Oct4 in embryos. We have removed the statement indicating that these are the direct targets in the revised manuscript.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster 1, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster 1, 11.

      Thank you for the comment that we hope we can clarify.

      Lines 143-146 are: “Notably, the peaks activated by Oct4 but not by Sox2 in the ICM tended to be already open at the morula stage (Figure 2B, clusters 1 and 11), whereas those dependent on both Oct4 and Sox2 became open in the ICM (Figure 2B, clusters 3, 8 and 14).”

      We agree with you that clusters 3/8/14 are more enriched in OCT-SOX motifs than clusters 1/11. However, this is consistent with our observation that accessibility of peaks in clusters 1 and 11 relies mainly on Oct4, while accessibility in clusters 3, 8, 14 depends on both Oct4 and Sox2. But maybe the term “activate” is misleading. We have rephrased the text as below:

      “Notably, compared to the peaks that depend on Oct4 but not Sox2 (Figure 2B, clusters 1 and 11), those reliant on both Oct4 and Sox2 show greater enrichment of the OCT-SOX motif (Figure 2B, clusters 3, 8 and 14). The former group was generally already open in the morula, while the latter group only became open in the ICM. “

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      We hope to be able to explain this.

      Line 153-159 refer to two datasets:  Figure Supplement 3C and 3D.

      In Figure Supplement 3C, the average plots above the heatmaps show that the decreased ATAC-seq peaks (the indigo lines) have higher enrichment with Sox2 CUT&RUN signals than the increased or unchanged peaks (the yellow and light blue lines, respectively).

      In Figure Supplement 3D, the average plots indicate that H3K27ac signals around the center of the decreased ATAC-seq peaks (the indigo line) show higher enrichment compared to the unaltered and decreased groups (the light blue and yellow lines, respectively). Notably, H3K27ac enrichment appears slightly offset from the central nucleosome-free regions.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      You are right. We have replaced “identify” with “infer” in the revised manuscript.

      (3) The Discussion is lengthy and should be condensed.

      We have shortened the discussion in the revised manuscript.

    1. eLife Assessment

      This analysis of the formation of the oral-aboral body axis in cnidarians, the sister group of bilaterians, is a significant and fundamental contribution to the field of Wnt signalling and planar cell polarity. The evidence supporting the conclusions is compelling and has the potential to contribute to a deeper understanding of the origin and evolution of Wnt signalling in metazoans. These findings will be of broad interest to developmental and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This noteworthy paper examines the role of planar cell polarity and Wnt signalling in the body axis formation of the hydrozoan Clytia. In contrast to the freshwater polyp Hydra or the sea anemone Nematostella, Clytia represents a cnidarian model system with a complete life cycle (planula-polyp-medusa). In this species, classical experiments have demonstrated that a global polarity is established from the oral end of the embryos (Freeman, 1981). Prior research has demonstrated that Wnt3 plays a role in the formation of the oral organiser in Clytia and other cnidarians, acting in an autocatalytic feedback loop with β-catenin. However, the question of whether and to what extent an oral-aboral gradient of Wnt activity is established remained unanswered. This gradient is thought to control both tissue differentiation and tissue polarity. The planar cell polarity (PCP) pathway has been linked to this polarity, although it is generally considered to be β-catenin independent.

      The authors have conducted a series of sophisticated experiments utilising morpholinos, mRNA microinjection, and immunofluorescent visualisation of PCP. The objective of these experiments was to address the function of Wnt3, β-catenin, and PCP core proteins in the coordination of the global polarity of Clytia embryos. The authors conclude that PCP plays a role in regulating polarity along the oral-aboral axis of embryos and larvae. This offers a conceivable explanation for how polarity information is established and distributed globally during Clytia embryogenesis, with implications for our understanding of axis formation in cnidarians and the evolution of Wnt signalling in general. While the experiments are well-designed and executed, there are some criticisms, questions, or suggestions that should be addressed.

      Comments:

      Beautiful and solid experiments to clarify the role of canonical Wnt signalling and PCP core factors in coordinating planar cell polarity. However, there are also several points that should be addressed.

      (1) Wnt3 cue and global PCP. PCP has been described in detail in a previous paper on Clytia (Momose et al, 2012): its orientation along the oral-aboral body axis (ciliary basal body positioning studies), and its function in directional polarity during gastrulation (Stbm-, Fz1-, and Dsh-MO experiments). I wonder if this part could be shortened. What is new, however, are the knockdown and Wnt3-mRNA rescue experiments, which provide a deeper insight into the link between Wnt3 function in the blastopore organiser as a source or cue for axis formation. These experiments demonstrate that the Wnt3 knockdown induces defects equivalent to PCP factor knockdown, but can be rescued by Wnt3-mRNA injection, even at a distance of 200 µm away from the Wnt-positive area. The experimental set-up of these new molecular experiments follows in important aspects those of Freeman's experiments of 1981 (who in turn was motivated to re-examine Teissier's work of 1931/1933 ...). Freeman did not use the term "global polarity" but the concept of an axis-inducing source and a long-range tissue polarity can be traced back to both researchers.

      (2) PCP propagation and β-catenin. The central but unanswered question in this study focuses on the interaction between Wnt3 and PCP and the propagation of PCP. Wnt3 has been described in cnidarians but also in vertebrates and insects as a canonical Wnt interacting with β-catenin in an autocatalytic loop. The surprising result of this study is that the action of Wnt3 on PCP orientation is not inhibited in the presence of a dominant-negative form of CheTCF (dnTCF) ruling out a potential function of β-catenin in PCP. This was supported by studies with constitutively active β-catenin (CA-β-cat) mRNA which was unable to restore PCP coordination nor elongation of Wnt3-depleted embryos but did restore β-catenin-dependent gastrulation. Based on these data, the authors conclude that Wnt3 has two independent roles: Wnt/β-catenin activation and initial PCP orientation (two-step model for PCP formation). However, the molecular basis for the interaction of Wnt3 with the PCP machinery and how the specificity of Wnt3 for both pathways is regulated at the level of Wnt-receiving cells (Fz-Dsh) remain unresolved. Also, with respect to PCP propagation, there is no answer with respect to the underlying mechanisms. The authors found that PCP components are expressed in the mid-blastula stage, but without any further indication of how the signal might be propagated, e.g., by a wavefront of local cell alignment. Here, it is necessary to address the underlying possible cellular interactions more explicitly.

      (3) The proposed two-step model for PCP formation has important evolutionary implications in that it excludes the current alternate model according to which a long-range Wnt3-gradient orients PCP ("Wnt/β-catenin-first"). Nevertheless, the initial PCP orientation by Wnt3 - as proposed in the two-step model - is not explained at all on the molecular level. Another possible, but less well-discussed and studied option for linking Wnt3 with PCP action could be the role of other Wnt pathways. The authors present compelling evidence that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development. The authors convincingly show that Wnt3 is the most highly expressed Wnt in Clytia at all stages of development (Figure S1). However, Wnt7 is also more highly expressed, which makes it a candidate for signal transduction from canonical Wnts to PCP Wnts. An involvement of Wnt7 in PCP regulation has been described in vertebrates (http://dx.doi.org/10.1016/j.celrep.2013.12.026). This would challenge the entire discussion and speculation on the evolutionary implications according to which PCP Wnt signaling comes first (PCP-first scenario") and canonical Wnt signaling later in metazoan evolution.

      (4) The discussion, including Figure 6, is strongly biased towards the traditional evolutionary scenario postulating a choanzoan-sponge ancestry of metazoans. Chromosome-linkage data of pre-metazoans and metazoans (Schulz et al., 2023; https://doi.org/10 (1038/s41586-023-05936-6) now indicate a radically different scenario according to which ctenophores represent the ancestral form and are sister to sponges, cnidarians and bilaterians (the Ctenophora-sister hypothesis). This has also implications for the evolution of Wnt signalling, as discussed in the recent Nature Genetics Review by Holzem et al. (2024) (https://doi.org/10.1038/s41576-024-00699-w). Furthermore, it calls into question the hypothesis of a filter-feeding multicellular gastrula-like ancestor as proposed by Haeckel (Maegele et al., 2023). These papers have not yet been referenced, but they would provide a more robust discussion.

    3. Reviewer #2 (Public review):

      Summary:

      Canonical Wnt signaling has previously been shown to be responsible for correct patterning of the oral-aboral axis as well as germ layer formation in several cnidarians. In the post-gastrula stage, the planula larvae are not only elongated, they have a specific swimming direction due to the decentralized cellular positioning and slanted anchoring of the cilia. This in turn is in most other animals the result of a Wnt-Planar-cell polarity pathway. This paper by Uveira et al investigates the role of Wnt3 signaling in serving as a local cue for the PCP pathway which then is responsible for the orientation of the cilia and elongation of the planula larva of the hydrozoan Clytia hemisphaerica. Wnt3 was shown before to activate the canonical pathway via ß-catenin and to act as an axial organizer. The authors provide compelling evidence for this somewhat unusual direct link between the pathways through the same signaling molecule, Wnt3. In conclusion, they propose a two-step model: (1) local orientation by Wnt3 secretion and (2) global propagation by the PCP pathway over the whole embryo.

      Strengths:

      In a series of elegant and also seemingly sophisticated experiments, they show that Wnt3 activates the PCP pathway directly, as it happens in the absence of canonical Wnt signaling (e.g. through co-expression of dnTCF). Conversely, constitutive active ß-catenin was not able to rescue PCP coordination upon Wnt3 depletion, yet restored gastrulation. This uncouples the effect of Wnt3 on axis specification and morphogenetic movements from the elongation via PCP. Through transplantation of single blastomeres providing a local source of Wnt3, they also demonstrate the reorganization of cellular polarity immediately adjacent to the Wnt3-expressing cell patch. These transplantation experiments also uncover that mechanical cues can also trigger polarization, suggesting a mechanotransduction or direct influence on subcellular structures, e.g. actin fiber orientation.

      This is a beautiful and elegant study addressing an important question. The results have significant implications also for our understanding of the evolutionary origin of axis formation and the link of these two ancient pathways, which in most animals are controlled by distinct Wnt ligands and Frizzled receptors. The quality of the data is stunning and the paper is written in a clear and succinct manner. This paper has the potential to become a widely cited milestone paper.

      Weaknesses:

      I can not detect any major weaknesses. The work only raises a few more follow-up questions, which the authors are invited to comment on.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Oleh et al. uses in vitro electrophysiology and compartmental modeling (via NEURON) to investigate the expression and function of HCN channels in mouse L2/3 pyramidal neurons. The authors conclude that L2/3 neurons have developmentally regulated HCN channels, the activation of which can be observed when subjected to large hyperpolarizations. They further conclude via blockade experiments that HCN channels in L2/3 neurons influence cellular excitability and pathway-specific EPSP kinetics, which can be neuromodulated. While the authors perform a wide range of slice physiology experiments, concrete evidence that L2/3 cells express functionally relevant HCN channels is limited. There are serious experimental design caveats and confounds that make drawing strong conclusions from the data difficult. Furthermore, the significance of the findings is generally unclear, given modest effect sizes and a lack of any functional relevance, either directly via in vivo experiments or indirectly via strong HCN-mediated changes in known operations/computations/functions of L2/3 neurons.

      Specific points:

      (1) The interpretability and impact of this manuscript are limited due to numerous methodological issues in experimental design, data collection, and analysis. The authors have not followed best practices in the field, and as such, much of the data is ambiguous and/or weak and does not support their interpretations (detailed below). Additionally, the authors fail to appropriately explain their rationale for many of their choices, making it difficult to understand why they did what they did. Furthermore, many important references appear to be missing, both in terms of contextualizing the work and in terms of approach/method. For example, the authors do not cite Kalmbach et al 2018, which performed a directly comparable set of experiments on HCN channels in L2/3 neurons of both humans and mice. This is an unacceptable omission. Additionally, the authors fail to cite prior literature regarding the specificity or lack thereof of Cs+ in blocking HCN. In describing a result, the authors state "In line with previous reports, we found that L2/3 PCs exhibited an unremarkable amount of sag at 'typical' current commands" but they then fail to cite the previous reports.

      We thank the reviewer for the thorough examination of our manuscript; however, we disagree with many of the raised concerns for several reasons, as detailed here:

      To address the lack of certain citations, we would like to emphasize that in the introduction section, we did initially focus on the several decades-long line of investigation into the HCN channel content of layer 2/3 pyramidal cells (L2/3 PCs), where there has undoubtedly been some controversy as to their functional contribution. We did not explicitly cite papers that claimed to find no/little HCN channels/sag- although this would be a significant list of publications from some excellent investigators, as methods used may have differed from ours leading to different interpretations. Simply stated, unless one was explicitly looking for HCN in L2/3 PCs, it might go unobserved. However, we now addressed this more clearly in the revision:

      Just to take one example: in the publication mentioned by the reviewer (Kalmbach et al 2018), the investigators did not carry out voltage clamp or dynamic clamp recordings, as we did in our work here. Furthermore, the reported input resistance values in the aforementioned paper were far above other reports in mice (Routh et al. 2022, Brandalise et al 2022, Hedrick et al 2012; which were similar to our findings here), suggesting that recordings in Kalmbach were carried out at membrane potentials where HCN activation may be less available (Routh, Brager and Johnston 2022).

      Another reason for some mixed findings in the field is undoubtedly due to the small/nonexistent sag in L2/3 current clamp recordings (in mice). We also observed a very small sag, which can be explained by the following:  The ‘sag’ potential is a biphasic voltage response emerging from a relatively fast passive membrane response and a slower Ih activation. In L2/3 PCs, hyperpolarization-activated currents are apparently faster than previously described, and are located proximally (Figure 2 & Figure 5). Therefore, their recruitment in mouse L2/3 PCs is on a similar timescale to the passive membrane response, resulting in a more monophasic response. We now include a more full set of citations in the updated introduction section, to highlight the importance of HCN channels in L2/3 PCs in mice (and other species).

      The justification for using cesium (i.e., ‘best practices’) is detailed below.

      (2) A critical experimental concern in the manuscript is the reliance on cesium, a nonspecific blocker, to evaluate HCN channel function. Cesium blocks HCN channels but also acts at potassium channels (and possibly other channels as well). The authors do not acknowledge this or attempt to justify their use of Cs+ and do not cite prior work on this subject. They do not show control experiments demonstrating that the application of Cs+ in their preparation only affects Ih. Additionally, the authors write 1 mM cesium in the text but appear to use 2 mM in the figures. In later experiments, the authors switch to ZD7288, a more commonly used and generally accepted more specific blocker of HCN channels. However, they use a very high concentration, which is also known to produce off-target effects (see Chevaleyre and Castillo, 2002). To make robust conclusions, the authors should have used both blockers (at accepted/conservative concentrations) for all (or at least most) experiments. Using one blocker for some experiments and then another for different experiments is fraught with potential confounds.

      To address the concerns regarding the usage of cesium to block HCN channels, we would like to state that neither cesium nor ZD-7288 are without off-target effects, however in our case the potential off-target effects of external cesium were deemed less impactful, especially concerning AP firing output experiments. Extracellular cesium has been widely accepted as a blocker of HCN channels (Lau et al. 2010, Wickenden et al. 2009, Rateau and Ropert 2005, Hemond et al. 2009, Yang et al. 2015, Matt et al. 2010). However, it is well known to act on potassium channels as well at higher concentrations, which has been demonstrated with intracellular and extracellular application (Puil et al. 1981, Fleidervish et al. 2008, Williams et al. 1991, 2008).

      Although we initially performed ‘internal’ control experiments to ensure the cesium concentration was unlikely to greatly block voltage gated K+ channels during our recordings, we recognize these were not included in the original manuscript. These are detailed as follows: during our recordings cesium had no significant effect on action potential halfwidth, ruling out substantial blocking of potassium channels, nor did it affect any other aspects of suprathreshold activity (now reported in results, page 4 - line 113). Furthermore, we observed similar effects on passive properties (resting membrane potential, input resistance) following ZD-7288 as with cesium, which we now also updated in our figures (Supplementary Figure 1). We did acknowledge that ZD-7288 is a widely accepted blocker of HCN, and for this reason we carried out some of our experiments using this pharmacological agent instead of cesium.

      On the other hand, ZD-7288 suffers from its own side effects, such as potential effects on sodium channels (Wu et al. 2012) and calcium channels (Sánchez-Alonso et al. 2008, Felix et al. 2003). As our aim was to provide functional evidence for the importance of HCN channels, we initially deemed these potential effects unacceptable in experiments where AP firing output (e.g., in cell-attached experiments) was measured. Nonetheless, in new experiments now included here, we found the effects of ZD and cesium on AP output were similar as shown in new Supplemental Figure 1.

      Many experiments were supported by complementary findings using external cesium and ZD-7288. For example, the effect of ZD-7288 on EPSPs was confirmed by similar synaptic stimulation experiments using cesium. This is important, as synaptic inputs of L2/3 PCs are modulated by both dendritic sodium (Ferrarese et al. 2018) and calcium channels (Landau 2022), therefore the application of ZD-7288 alone may have been difficult to interpret in isolation. We thank the reviewer for bringing up this important point.

      (3) A stronger case could be made that HCN is expressed in the somatic compartment of L2/3 cells if the authors had directly measured HCN-isolated currents with outside-out or nucleated patch recording (with appropriate leak subtraction and pharmacology). Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work. It has been shown to produce erroneous results over and over again in the field due to well-known space clamp problems (see Rall, Spruston, Williams, etc.). The authors could have also included negative controls, such as recordings in neurons that do not express HCN or in HCN-knockout animals. Without these experiments, the authors draw a false equivalency between the effects of cesium and HCN channels, when the outcomes they describe could be driven simply by multiple other cesium-sensitive currents. Distortions are common in these preparations when attempting to study channels (see Williams and Womzy, J Neuro, 2011). In Fig 2h, cesium-sensitive currents look too large and fast to be from HCN currents alone given what the authors have shown in their earlier current clamp data. Furthermore, serious errors in leak subtraction appear to be visible in Supplementary Figure 1c. To claim that these conductances are solely from HCN may be misleading.

      We disagree with the argument that “Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work”. Although this method is not without its confounds (i.e. space clamp), it is still a useful initial measure as demonstrated countless times in the literature. However, the reviewer is correct that the best approach to establish the somatodendritic distribution of ion channels is by direct somatic and dendritic outside-out patches. Due to the small diameter of L2/3 PC dendrites, these experiments haven’t been carried out yet in the literature for any other ion channel either to our knowledge. Mapping this distribution electrophysiologically may be outside the scope of the current manuscript, but it was hard for us to ignore the sheer size of the Cs<sup>+</sup> sensitive hyperpolarizing currents in whole cell. Thus, we will opt to report this data.

      Also, we should point out that space clamp-related errors manifest in the overestimation of frequency-dependent features, such as activation kinetics, and underestimation of steady-state current amplitudes. The activation time constant of our measured currents are somewhat faster than previously reported; reducing major concerns regarding space clamp errors. Furthermore, we simply do not understand what “too large… to be from HCN currents” means. Our voltage-clamp measured currents are similar to previously reported HCN currents (Meng et al. 2011, Li 2011, Zhao et al. 2019, Yu et al. 2004, Zhang et al. 2008, Spinelli et al. 2018, Craven et al. 2006, Ying et al. 2012, Biel et al. 2009).

      Furthermore, we should point out that our measured currents activated at hyperpolarized voltages, had the same voltage dependence as HCN currents, did not show inactivation, influenced both input resistance and resting membrane potential, and are blocked by low concentration extracellular cesium. Each of these features would point to HCN.

      (4) The authors present current-clamp traces with some sag, a primary indicator of HCN conductance, in Figure 2. However, they do not show example traces with cesium or ZD7288 blockade. Additionally, the normalization of current injected by cellular capacitance and the lack of reporting of input resistance or estimated cellular size makes it difficult to determine how much current is actually needed to observe the sag, which is important for assessing the functional relevance of these channels. The sag ratio in controls also varies significantly without explanation (Figure 6 vs Figure 7). Could this variability be a result of genetically defined subgroups within L2/3? For example, in humans, HCN expression in L2/3 varies from superficial and deep neurons. The authors do not make an effort to investigate this. Regardless of inconsistencies in either current injection or cell type, the sag ratio appears to be rather modest and similar to what has already been reported previously in other papers.

      We thank the reviewer for pointing out that our explanation for the modest sag ratio might have not been sufficient to properly understand why this measurement cannot be applied to layer 2/3 pyramidal cells. Briefly: sag potential emerges from a relatively (compared to I<sub>h</sub>) fast passive membrane response and a slower HCN recruitment. The opposing polarity and different timescales of these two mechanisms results in a biphasic response called “sag” potential. However, if the timescale of these two mechanisms is similar, the voltage response is not predicted to be biphasic. We have shown that hyperpolarization activated currents in our preparations are fast and proximal, therefore they are recruited during the passive response (see Figure 2g.). This means that although a substantial amount of HCN currents are activated during hyperpolarization, their activation will not result in substantial sag. Therefore, sag ratio measurement is not necessarily applicable to approximate the HCN content of mouse L2/3 PCs. We would like to emphasize that sag ratio measurements are correct in case of other cell types (i.e. L5 and CA1 PCs_,_ and our aim is not to discredit the method, but rather to show that it cannot be applied similarly in the case of mouse L2/3 PCs.

      Our own measurements, similar to others in the literature show that L2/3 PCs exhibit modest sag ratios, however, this does not mean that HCN is not relevant. I<sub>h</sub> activation in L2/3 PCs does not manifest in large sag potential but rather in a continuous distortion of steady-state responses (Figure 2b.). The reviewer is correct that L2/3 PCs are non-homogenous, therefore we sampled along the entire L2/3 axis. This yielded some potential variability in our results (i.e., passive properties); yet we did not observe any cells where hyperpolarizing-activated/Cs<sup>+</sup>-sensitive currents could not be resolved. As structural variability of L2/3 cells does result in variability in cellular capacitance, we compensated for this variability by injecting cellular capacitance-normalized currents. Our measured cellular capacitances were in accordance with previously published values, in the range of 50-120 pF. Therefore, the injected currents were not outside frequently used values. Together, we would like to state that whether substantial sag potential is present or not, initial estimates of the HCN content for each L2/3 PC should be treated with caution.

      (5) In the later experiments with ZD7288, the authors measured EPSP half-width at greater distances from the soma. However, they use minimal stimulation to evoke EPSPs at increasingly far distances from the soma. Without controlling for amplitude, the authors cannot easily distinguish between attenuation and spread from dendritic filtering and additional activation and spread from HCN blockade. At a minimum, the authors should share the variability of EPSP amplitude versus the change in EPSP half-width and/or stimulation amplitudes by distance. In general, this kind of experiment yields much clearer results if a more precise local activation of synapses is used, such as dendritic current injection, glutamate uncaging, sucrose puff, or glutamate iontophoresis. There are recording quality concerns here as well: the cell pictured in Figure 3a does not have visible dendritic spines, and a substantial amount of membrane is visible in the recording pipette. These concerns also apply to the similar developmental experiment in 6f-h, where EPSP amplitude is not controlled, and therefore, attenuation and spread by distance cannot be effectively measured. The outcome, that L2/3 cells have dendritic properties that violate cable theory, seems implausible and is more likely a result of variable amplitude by proximity.

      To resolve this issue, we made a supplementary figure showing elicited amplitudes, which showed no significant distance dependence and minimal variability (new Supplementary Figure 6). We thank the reviewer for suggesting an amplitude-halfwidth comparison control (now included as new Supplementary Figure 6).). To address the issue of the non-visible spines, we would like to note that these images are of lower magnification and power to resolve them. The presence of dendritic spines was confirmed in every recorded pyramidal cell observed using 2P microscopy at higher magnification.

      We would like to emphasize that although our recordings “seemingly” violated the cable theory, this is only true if we assume a completely passive condition. As shown in our manuscript, cable theory was not violated, as the presence of NMDA receptor boosting explained the observed ‘non-Rallian’ phenomenon.

      (6) Minimal stimulation used for experiments in Figures 3d-i and Figures 4g-h does not resolve the half-width measurement's sensitivity to dendritic filtering, nor does cesium blockade preclude only HCN channel involvement. Example traces should be shown for all conditions in 3h; the example traces shown here do not appear to even be from the same cell. These experiments should be paired (with and without cesium/ZD). The same problem appears in Figure 4, where it is not clear that the authors performed controls and drug conditions on the same cells. 4g also lacks a scale bar, so readers cannot determine how much these measurements are affected by filtering and evoked amplitude variability. Finally, if we are to believe that minimal stimulation is used to evoke responses of single axons with 50% fail rates, NMDA receptor activation should be minimal to begin with. If the authors wish to make this claim, they need to do more precise activation of NMDA-mediated EPSPs and examine the effects of ZD7288 on these responses in the same cell. As the data is presented, it is not possible to draw the conclusion that HCN boosts NMDA-mediated responses in L2/3 neurons.

      As stated in the figure legends, the control and drug application traces are from the same cell, both in figure 3 and figure 4, and the scalebar is not included as the amplitudes were normalized for clarity. We have address the effects of dendritic filtering above in answer (5), and cesium blockade above in answer (2). To reiterate, dendritic filtering alone cannot explain our observations, and cesium is often a better choice for blocking HCN channels compared to ZD-7288, which blocks sodium channels as well.

      When an excitatory synaptic signal arrives onto a pyramidal cell in typical conditions, neurotransmitter sensitive receptors transmit a synaptic current to the dendritic spine. This dendritic spine is electrically isolated by the high resistance of the spine neck and due to the small membrane surface of the spine, the synaptic current can elicit remarkably large voltage changes. These voltage changes can be large enough to depolarize the spine close to zero millivolts upon even single small inputs (Jayant et al. 2016). Therefore, to state that single inputs arriving to dendritic spines cannot be large enough to recruit NMDA receptor activation is incorrect. This is further exemplified by the substantial literature showing ‘miniature’ NMDA recruitment via stochastic vesicle release alone.

      (7) The quality of recordings included in the dataset has concerning variability: for example, resting membrane potentials vary by >15-20 mV and the AP threshold varies by 20 mV in controls. This is indicative of either a very wide range of genetically distinct cell types that the authors are ignoring or the inclusion of cells that are either unhealthy or have bad seals.

      Although we are aware of the diversity of L2/3 PCs, resolving further layer depth differences is outside the scope of our current manuscript. However, as shown in Kalmbech et al, resting membrane potential can greatly vary (>15-20 mV) in L2/3 PCs depending on distance from pia. We acknowledge that the variance in AP threshold is large and could be due to genetically distinct cell types.

      (8) The authors make no mention of blocking GABAergic signaling, so it must be assumed that it is intact for all experiments. Electrical stimulation can therefore evoke a mixture of excitatory and inhibitory responses, which may well synapse at very different locations, adding to interpretability and variability concerns.

      We thank the reviewer for pointing out our lack of detail regarding the GABAergic signaling blocker SR 95531. We did include this drug in our recordings of (50Hz stim.) signal summation, so GABAergic responses did not contaminate our recordings. We now included this information in the results section (page 5) and the methods section (page 15)

      (9) The investigation of serotonergic interaction with HCN channels produces modest effect sizes and suffers the same problems as described above.

      We do not agree with the reviewer that 50% drop in neuronal AP firing responses (Figure 7b) was a modest effect size. Thus, we opted to keep this data in the manuscript.

      (10) The computational modeling is not well described and is not biologically plausible. Persistent and transient K channels are missing. Values for other parameters are not listed. The model does not seem to follow cable theory, which, as described above, is not only implausible but is also not supported by the experimental findings.

      The model was downloaded from the Cell Type Database from the Allen Institute, with only minor modifications including the addition of dendritic HCN channels and NDMA receptors- which were varied along a wide parameter space to find a ‘best fit’ to our observations. These additions were necessary to recapitulate our experimental findings. We agree the model likely does not fully recapitulate all aspects of the dendrites, which as we hope to convey in this manuscript, are not fully resolved in mouse L2/3 PCs. This is a previously published neuronal model, and despite its potential shortcomings, is one among a handful of open-source neuronal models of a fully reconstructed L2/3 PC.

      Reviewer #2 (Public Review):

      Summary:

      This paper by Olah et al. uncovers a previously unknown role of HCN channels in shaping synaptic inputs to L2/3 cortical neurons. The authors demonstrate using slice electrophysiology and computational modeling that, unlike layer 5 pyramidal neurons, L2/3 neurons have an enrichment of HCN channels in the proximal dendrites. This location provides a locus of neuromodulation for inputs onto the proximal dendrites from L4 without an influence on distal inputs from L1. The authors use pharmacology to demonstrate the effect of HCN channels on NMDA-mediated synaptic inputs from L4. The authors further demonstrate the developmental time course of HCN function in L2/3 pyramidal neurons. Taken together, this a well-constructed investigation of HCN channel function and the consequences of these channels on synaptic integration in L2/3 pyramidal neurons.

      Strengths:

      The authors use careful, well-constrained experiments using multiple pharmacological agents to asses HCN channel contributions to synaptic integrations. The authors also use a voltage clamp to directly measure the current through HCN channels across developmental ages. The authors also provide supplemental data showing that their observation is consistent across multiple areas of the cerebral cortex.

      Weaknesses:

      The gradient of the HCN channel function is based almost exclusively on changes in EPSP width measured at the soma. While providing strong evidence for the presence of HCN current in L2/3 neurons, there are space clamp issues related to the use of somatic whole-cell voltage clamps that should be considered in the discussion.

      We thank the reviewer for pointing out our careful and well-constrained experiments and for making suggestions. The potential effects of space clamp errors are detailed in the extended explanations under Reviewer 1, Specific points (3).

      Reviewer #3 (Public Review):

      Summary:

      The authors study the function of HCN channels in L2/3 pyramidal neurons, employing somatic whole-cell recordings in acute slices of visual cortex in adult mice and a bevy of technically challenging techniques. Their primary claim is a non-uniform HCN distribution across the dendritic arbor with a greater density closer to the soma (roughly opposite of the gradient found in L5 PT-type neurons). The second major claim is that multiple sources of long-range excitatory input (cortical and thalamic) are differentially affected by the HCN distribution. They further describe an interesting interplay of NMDAR and HCN, serotonergic modulation of HCN, and compare HCN-related properties at 1, 2 and 6 weeks of age. Several results are supported by biophysical simulations.

      Strengths:

      The authors collected data from both male and female mice, at an age (6-10 weeks) that permits comparison with in vivo studies, in sufficient numbers for each condition, and they collected a good number of data points for almost all figure panels. This is all the more positive, considering the demanding nature of multi-electrode recording configurations and pipette-perfusion. The main strength of the study is the question and focus.

      Weaknesses:

      Unfortunately, in its present form, the main claims are not adequately supported by the experimental evidence: primarily because the evidence is indirect and circumstantial, but also because multiple unusual experimental choices (along with poor presentation of results) undermine the reader's confidence. Additionally, the authors overstate the novelty of certain results and fail to cite important related publications. Some of these weaknesses can be addressed by improved analysis and statistics, resolving inconsistent data across figures, reorganizing/improving figure panels, more complete methods, improved citations, and proofreading. In particular, given the emphasis on EPSPs, the primary data (for example EPSPs, overlaid conditions) should be shown much more.

      However, on the experimental side, addressing the reviewer's concerns would require a very substantial additional effort: direct measurement of HCN density at different points in the dendritic arbor and soma; the internal solution chosen here (K-gluconate) is reported to inhibit HCN; bath-applied cesium at the concentrations used blocks multiple potassium channels, i.e. is not selective for HCN (the fact that the more selective blocker ZD7288 was used in a subset of experiments makes the choice of Cs+ as the primary blocker all the more curious); pathway-specific synaptic stimulation, for example via optogenetic activation of specific long-range inputs, to complement / support / verify the layer-specific electrical stimulation.

      We thank the reviewer for their very careful examination of our manuscript and helpful suggestions. We addressed the concerns raised in the review and presented more raw traces in our figures. Although direct dendritic HCN mapping measurements are outside the scope of the current manuscript due to the morphological constraints presented by L2/3 PCs (which explains why no other full dendritic nonlinearity distribution has been described in L2/3 PCs with this method), we nonetheless supplemented our manuscript with additional suggested experiments as suggested. For example, we included the excellent suggestion of pathway-specific optogenetic stimulation to further validate the disparate effect of HCN channels for distal and proximal inputs. We agree that ZD-7288 is a widely accepted blocker of HCN channels. However, the off-target effects on sodium channels may have significantly confounded our measurements of AP output using extracellular stimulation. Therefore, we chose low concentration cesium as the primary blocker for those experiments, but now validated several other Cs<sup>+</sup>-based results with ZD-7288 as well.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have some issues that need clarification or correction.

      (1) On page 3, line 90, the authors state "We found that bath application of Cs+ (1mM)..." but the methods and Figure 1 state "2mM Cs+". Please check and correct.

      Correct, typo corrected.

      (2) Related to Cs+ application, the methods state that "CsMeSO4 (2mM) was bath applied..." Is this correct? CsMeSO4 is typically used intracellularly while CsCl is used extracellularly. If so, please justify. If not, please correct.

      It is correct. The justification for not using CsCl selectively extracellularly is that introducing intracellular chloride ions can significantly alter basic biophysical properties, unrelated to the cesium effect. However, no similar distinction has been made for CsMeSO4, which would exclude the use of this drug extracellularly.

      (3) The authors normalize the current injections by cell capacitance (pA/pF). Was this done because there is a significant variance in cell morphology? A bit of justification for why the authors chose to normalize the current injection this way would help. If there is significant variation in cell capacitance across cells (or developmental ages), the authors could also include these data.

      Indeed, we choose to normalize current injection to cellular capacitance due to the markedly different morphology of deep and superficial L2/3 PCs. Deeper L2/3 PCs have a pronounced apical branch, closely resembling other pyramidal cell types such as L5 PCs, while superficial L2/3 PC lack a thick main apical branch and instead are equipped with multiple, thinner apical dendrites. This morphological variation would yield an inherent bias in several of the reported measurements, therefore we corrected for it by normalizing current injection to cellular capacitance, similar to our previous recent publications (Olah, Goettemoeller et al., 2022, Goettemoeller et al. 2024, Kumar et al. 2024).

      (4) On page 15, line 445, the section heading is "PV cell NEURON modeling". Is this a typo? The models are of L2/3 pyramidal neurons, correct?  

      Correct, typo corrected.

      (5) Figures 3F and 3I are plots of the voltage integral for different inputs before and after Cs+. The y-axis label units are "pA*ms". This should be "mV*ms" for a voltage integral.  

      Correct, typo corrected.

      (6) On page 9, line 273, the text reads "Voltage clamp experiments revealed that the rectification of steady-state voltage responses to hyperpolarizing current injection was amplified with 5-CT (Fig. 7c)". Both the text and Figure 7C describe current clamp, not voltage clamp, recordings. Please check and correct.

      Correct, typo corrected.

      (7) Figure 2i looks to be a normalized conductance vs voltage (i.e. activation) plot. The y-axis shows 0-1 but the units are in nS. Is that a coincidence or an error?

      Correct, typo corrected.

      Reviewer #3 (Recommendations For The Authors):

      This is your paper. My comments are my own opinion, I don't expect you to agree or to respond. But I hope that what I wrote below will help you to understand my perspective.

      Please pardon my directness (and sheer volume) in this section - I have a lot of notes/thoughts and hope you may find some of them helpful. My high-level comments are unfortunately rather critical, and in (small) part that is because I encountered too many errors/typos/ambiguities in figures, legend, and text. I expect many would be caught with good proofreading, but uncorrected caused confusion on my part, or an inability to interpret your figures with confidence, given some ambiguity.

      The paper reads a bit like patchwork - likely a result of many "helpful" reviewers who came before me. Consider starting with and focusing on the synaptic findings, expanding the number of figures and panels dedicated to that, showing example traces for all conditions, and giving yourself the space to portray these complex experiments and results. While I'm not a fan of a large number of supplemental figures, I feel you could move the "extra" results to the supplementals to improve the focus and get right to the meat of it.

      For me, the main concern is that the evidence you present for the non-uniform HCN distribution is rather indirect. Ideally, I'd like to see patch recordings from various dendritic locations (as others have done in rats, at least; I'm not sure if L2/3 mice have had such conductance density measurements made in basal and apical dendrites). Otherwise, perhaps optical mapping, either functional or via staining. I also mention some concerns about the choice of internal and cesium. More generally, I want to see more primary data (traces), in particular for the big synaptic findings (non-uniform, L1-vs-L4 differences, NMDAR).

      We thank the reviewer for the helpful suggestions. Indeed, direct patch clamp recording is widely considered to be the best method to identify dendritic ion channel distribution, however, we choose an in silico approach instead, for several reasons. Undoubtedly, one of the main reasons to omit direct dendritic recordings was that due to the uniquely narrow apical dendrites this method is extremely challenging, with no previous examples in the literature where isolated dendritic outside-out patch recordings were achieved from this cell type. However, there are theoretical considerations as well. In primates, it has been demonstrated that HCN1 channels are concentrated on dendritic spines (Datta et al., 2023) therefore direct outside-out recordings are not adequate in these circumstances. In future experiments we could directly target L2/3 PC dendrites for outside out recordings in order to resolve dendritic nonlinearity distribution, although a cell-attached methodology may be better suited due to the HCN biophysical properties being closely regulated by intracellular signaling pathways.

      The introduction and Figures 1 and 2 are not so interesting and not entirely accurate: L2/3 do not have "abundant" HCN, nor is there an actual controversy about whether they have HCN. It's been clear (published) for years that they have about the same as all other non-PT neocortical pyramidal neurons (see e.g. Larkum 2007; Sheets 2011). Your own Figure 1A has a logarithmic scale and shows L2/3 as having the lowest expression (?) of all pyramidals and roughly 10x lower than L5 PT, but the text says "comparable", which is misleading.

      We thank the reviewer for this comment. Although there are sporadic reports in the literature about the HCN content of L2/3 PCs, most of these publications arrive to the same conclusion from the negligible sag potential (as the mentioned Larkum et al., 2007 publication); namely that L2/3 PCs do not contain significant amount of HCN channels. We have shown with voltage and current clamp recordings that this assumption is false, as sag potential is not a reliable indicator of HCN content in L2/3 PCs. With the term “controversial” we aimed to highlight the different conclusions of functional investigations (e.g. Sheets et al., 2011) and sag potential recordings (e.g. Larkum et al., 2007), regarding the importance of HCN channels in L2/3 PCs.

      Non-uniform HCN with distal lower density has already been published for a (rare) pyramidal neuron in CA1 (Bullis 2007), similar to what you found in L2/3, and different from the main CA1 population.

      We thank the reviewer for this suggestion. We have now included the mentioned citation in the introduction section (page 3).

      Express sag as a ratio or percentage, consistently. Figure out why in Figure 7 the average sag ratio is 0.02 while in Fig. S1 it is 0.07 (for V1) - that is a massive difference.

      The calculation of sag ratio is consistent across the manuscript (at -6pA.pF), except for experiments depicted in Fig. 7 where sag ratio was calculated from -2pA/pF steps. Explanation below:

      Sag should be measured at a common membrane potential, with each neuron receiving a current pulse appropriate to reach that potential. Your approach of capacitance-based may allow for the same, but it is not clear which responses are used to calculate a single sag value per cell (as in Figure 2d).

      Thank you, we now included this info in the methods section. Sag potential was measured at the -6 pA/pF step peak voltage, except for Fig. 7 as noted above. We have now included this discrepancy detail in the methods section (page 14 ). These recordings in Fig. 7 took significantly longer than any other recording in the manuscript, as it took a considerable time to reach steady-state response from 5-CT application. -6pA/pF is a current injection in the range of 400-800 pA, which was proven to be too severe for continued application in cells after more than an hour of recording. Accordingly, we decided to lower the hyperpolarizing current step in these recordings. The absolute value of sag is thus different in Fig. 7, but nonetheless the 5-CT effect was still significant. Notably, we probably wouldn’t have noticed the small sag in L2/3 here (and thus the entire study), save for the fact that we looked at -6pA/pF to begin.

      In a paper focused on HCN, I would have liked to see resonance curves in the passive characterization.

      We thank the reviewer for the suggestion. Resonance curves can indeed provide useful insights into the impact of HCN on a cell’s physiological behavior, however, these experiments are outside the scope of our current manuscript as without in vivo recordings, resonance curves do not contribute to the manuscript in our opinion.

      How did you identify L2/3? Did you target cells in L2 or L3 or in the middle, or did you sample across the full layer width for each condition? A quantitative diagram showing where you patched (soma) and where you stimulated (L1, L4) with actual measurements, would be helpful (supplemental perhaps). You mention in the text that some L2/3 don't have a tuft, suggesting some variability in morphology - some info on this would be useful, i.e. since you did fill at least some of the neurons (eg 3A), how similar/different are the dendritic arbors?

      We sampled the entire L2/3 region during our recordings. It has been published that deep and superficial L2/3  PCs are markedly different in their morphology, and a recent publication (Brandelise et al. 2023) has even separated these two subpopulations to broad-tufted and slender tufted pyramidal cells, which receive distinct subcortical inputs. Although this differentiation opens exciting avenues for future research, examining potential layer gradients in our dataset would warrant significantly higher sample numbers and is currently out of the scope of our manuscript.

      Distal vs proximal: this could use more clarification, considering how central it is to your results. What about a synapse on a basal dendrite, but 150 or 200 um from the soma, is that considered proximal? Is the distance to the soma you report measured along the 3D dendrite, along the 2D dendrite, as a straight line to the soma, or just relative to some layers or cortical markers? (I apologize if I missed this).

      We thank the reviewer for pointing out the missing description in the results section. We have amended this oversight (p15).  Furthermore, although deeper L3 PCs have characteristic apical and basal dendritic branches, when recordings were made from more superficial L2 cells, a large portion of their dendrites extended radially, which made their classification ambiguous. Therefore, we did not use “apical” and “basal” terminology in the paper to avoid confusion. Distances were measured along the 3D reconstructed surface of the recovered pyramidal cells. This information is now included in the methods.

      Line 445, "PV cell NEURON modeling" ... hmm. Everyone re-uses methods sections to some degree, but this is not confidence-inspiring, and also not from a proofreading perspective.

      We have corrected the typo.

      It seems that you constructed a new HCN NEURON mechanism when several have been published/reviewed already. Please explain your reasons or at least comment on the differences.

      There are slight differences in our model compared to previously published models. Nevertheless, we took a previously published HCN model as a base (Gasparini et al, 2004), and created our own model to fit our whole-cell voltage clamp recordings.

      Bath-applied Cs+ can change synaptic transmission (in the hippocampus; Chevaleyre 2002). But also ZD7288 has some such effects. Also, see (Harris 1995) for a Cs+ and ZD7288 comparison. As well as (Harris 1994) for more Cs+ side-effects (it broadens APs, etc). Bath-applied blockers may affect both long-range and local synapses in your recordings, via K-channels or perhaps presynaptic HCN (though I am aware of your Fig. 1e). Since you can do intracellular perfusion, you could apply ZD7288 postsynaptically (Sheets 2011), an elegant solution.

      We thank the reviewer for the suggestion. We were aware of the potential presynaptic effects of cesium (i.e., presynaptic Kv or other channel effects) and did measure PPR after cesium application (Fig. 1h), noting no effect. At Cs<sup>+</sup> concentrations used here, we now also include new data in the results showing no effect on somatically recorded AP waveform (i.e., representative of a Kv channel effect). As stated earlier for reviewer 1, we now performed additional experiments using either cesium or ZD-7288 for comparison (e.g., see updated Fig. 1; Supplementary Figure 1; Fig. 3b-e). Intracellular ZD re-perfusion is an elegant solution which we will absolutely consider in future experiments.

      K-Gluconate is reported to inhibit Ih (Velumian 1997), consider at least some control experiments with a different internal for the main synaptic finding - maybe you'll find no big change ...

      We thank the reviewer for the suggestion. Although K-Gluconate can inhibit HCN current, the use of this intracellular solution is often used in the literature to measure this current (Huang & Trussel 2014). We have chosen this intracellular solution to improve recording stability.  

      (Biel 2009) is a very comprehensive HCN review, you may find it useful.

      We thank the reviewer for bringing this to our attention, we have now included the citation in the introduction.

      "Hidden" in your title seems too much.

      We changed the title to more accurately describe our findings and removed ‘hidden’.

      While I'm glad you didn't record at room temperature, the choice of 30C seems a bit unfortunate - if you go to the trouble to heat the bath, why not at least 34C, which is reasonably standard as an approximation for physiological temperature?

      We thank the reviewer for pointing this out. The choice of 30C was made to approach physiological temperature levels, while preserving the slices for extended amounts of time which is a standard approach. Future experiments in vivo be performed to further understand the naturalistic relevance at ~37C.

      Line 506: do you mean "Hz" here? It's not a frequency, is it? I think it's a unitless ratio?

      Correct, we have amended the typo.

      Line 95: you have not shown that HCN is "essential" for "excess" AP firing.

      We have corrected the phrasing, we agree.

      Fig. 2b,c: is this data from a single example neuron, maybe the same neuron as in 2a? Or from all recorded neurons pooled?

      The data is from several recorded cells pooled.

      Fig. 3 (important figure):

      Why did you not use a paired test for panels e and f? You have the same number of neurons for each condition and the expectation is that you record each neuron in control and then in cesium condition, which would be a paired comparison. Or did you record only 1 condition per neuron?

      This figure presents your main finding (in my opinion). You should show examples of the synaptic responses, i.e. raw traces, for each condition and panel, and overlaid in such a way that the reader can immediately see the relevant comparison - it's worth the space it requires.

      We thank the reviewer for the suggestions. Traces are only overlaid in the paper when they come from the same cell. For Fig. 3d-i, EPSPs in every neuron were evoked in 2-3 different locations (i.e., 1-2 ‘L4’ locations for Type-I and Type-II synapses, and one ‘L1’ location in each) with the same stimulation pipette and one pharmacological condition per cell. Therefore two-sample t-test were used since the control and cesium conditions came from separate cells (i.e., separate observations). This was necessary, as we can never assume that the stimulating electrode can return back to the same synapse after moving it. We were not comfortable with showing overlaid traces from different cells, however, we did show representative traces from control and the Cs<sup>+</sup> conditions in Fig. 3h. Complementary ZD-7288 experiments can be found on panel b and c, where we did perform within-cell pharmacology (and thus used paired t-tests) from one stimulation area/cell. We hope these complementary experiments increase overall confidence as neither pharmacological approach is 100% without off-target effects. We now also included more overlaid traces where appropriate (i.e., Fig. 3b, and in the new  Fig. 3k experiments using within-cell pharmacology comparisons). We do realize these complementary approaches could cause confusion to the reader, and have now done our best to make the slightly different approaches in this Figure clearer in the results section.

      Consider repeating at least some of these critical experiments with ZD7288 instead of Cs+ (and not K-gluc), or even with ZD7288 pipette perfusion, if it's technically feasible here.

      We thank the reviewer for the suggestions. Although many of our recordings using Cs<sup>+</sup> already had complementary experiments (such as synaptic experiments Figure 3e vs Figure 3b), we recognize the need to extend the manuscript with more ZD-7288 experiments. We have now extended Figure 1 with three panels (Figure 1 c,d,e), which recapitulates a fundamental finding, the change in overall excitability upon HCN channel blockade, using ZD-7288 as well.

      Fig. 3a, why show a schematic (and weirdly scaled) stimulating electrode? Don't you have a BF photo showing the actual stimulating electrode, which you could trace to scale or overlay? Could you use this panel to indicate what counts as "distal" and what as "proximal", visually?

      The stimulating electrode was unfortunately not filled with florescent materials, therefore it was not captured during the z-stack.

      Fig. 3b: is the y-axis labeled correctly? A "100% change" would mean a doubling, but based on the data points here I think y=100% means "no change"?

      The scale is labeled correctly, 100% means doubling.

      Fig. 3b, c: again, show traces representing distal and proximal, not just one example (without telling us how far it was). And use those traces to illustrate the half-width measurement, which may be non-trivial.

      We have extended Figure 3b with an inset showing the effect of ZD-7288 on a proximal stimulating site. The legend now includes additional information indicating stimulating location 28 µm away from the soma in control conditions (black trace) and upon Z-7288 application (green trace).  

      Line 543, 549: it seems you swapped labels "h" and "i"?

      Typo corrected.

      Fig. 4b: to me, MK-801 only *partially* blocks amplification, but in the text L198 you write "abolish".

      We thank the reviewer for pointing this out. Indeed, there are several other subthreshold mechanisms that are still intact after pipette perfusion, which can cause amplification. We have now clarified this in the text (p7).

      Fig. 4e,f: what is the message? Uniform NMDAR? The red asterisk in (e) is at a proximal/distal ratio of roughly 1. I don't understand the meaning of the asterisk (the legend is too basic) and I'm surprised to see a ratio of 1 as the best fit, and also that the red asterisk is at a dendritic distance of 0 um in (f). This could use more explanation (if you feel it's relevant).

      We thank the reviewer for pointing this out. We have now included a better explanation in the results and figure legend. We have also updated the figure to make it clearer and added model traces in Fig. 4f, which correspond to example data from slices in Fig. 4g (both green). The graph suggests nonuniform, proximally abundant NMDA distribution. The color coding corresponds to the proximal EPSP halfwidth divided by distal EPSP halfwidth. It is true that the dendritic distance ‘center’ was best-fit very close to the soma, but also note the dispersion (distribution) half-width was >150mm, so there is quite a significant dendritic spread despite the proximal bias prediction. Based on this model there is likely NMDA spread throughout the entire dendrite, but biased proximally. Naturally, future work will need to map this at the spine level so this is currently an oversimplification. Nonetheless, a proximal NMDA bias was necessary to recapitulate findings from Fig. 3, and additional slice recordings in Fig. 4 were consistent with this interpretation.

      Fig. 4g: I feel your choice of which traces to overlay is focusing on the wrong question. As the reader, what I want to see here is an overlay of all 4 conditions for one pathway. If this is a sequential recording in a single cell (Cs, Cs+MK801, wash out Cs, MK801), then the overlay would be ideal and need not be scaled. Otherwise, you can scale it. But the L1/L4 comparison does not seem appropriate to me. I find myself trying to imagine what all the dark lines would look like overlaid, and all the light lines overlaid separately. Also, the time axis is missing from this panel. Consider a subtraction of traces (if appropriate).

      In these recordings, all EPSPs cells were measured using a stimulating electrode that was moved between L1 and L4 (only once, to keep the exact input consistent) to measure the different inputs in a single neuron. In separate sets of experiments, the same method was used but in the presence of Cs<sup>+</sup>, Cs<sup>+</sup> + MK-801, or MK-801 alone. This was the most controlled method in our hands for this type of approach, as drug wash outs were either impractical or not possible.  Overlaying four traces would have presented a more cluttered image, and were not actually performed experimentally. As our aim was to resolve the proximal-distal halfwidth relationship, therefore we deemed the within-cell L1 vs. L4 comparison appropriate. We have nonetheless added model traces in Fig. 4f, which correspond to example data from slices in Fig. 4g (both green). The bar graphs should serve also serve to illustrate the input-specific  relationship- i.e., that the only time the L1 and L4 EPSP relationship was inverted was in the presence of Cs<sup>+</sup> (green bars) and that this effect was occluded with simultaneous MK-801 in the pipette (red bars).

      Line 579: should "hyperpolarized" be depolarized?

      Corrected

      Fig. 5a: it looks like the HCN density is high in the most basal dendrites (black curve above), then drops towards the soma, then rises again in the apicals (red curve). Is that indeed how the density was modeled? If so, this is completely at odds with the impression I received from reading your text and experimental data - there, "proximal" seems to mean where the L4 axons are, and "distal" seems to mean where the L1 axons are, in other words, high HCN towards the pia and low HCN towards the white matter. But this diagram suggests a biphasic hill-valley-hill distribution of HCN (meaning there is a second "distal" region below the soma). In that case, would the laterally-distant basal dendrites also be considered distal? How does the model implement the distribution - is it 1D, 2D or 3D? As you can probably tell, this figure raised more questions for me and made me wonder why I don't have a better understanding yet of your definitions.

      We thank the reviewer for pointing this out. We agree our initial cartoon of the parameter fitting procedure was not accurate and should have just been depicted a single ‘curve’. We have now simplified it to better demonstrate what the model is testing, and also made the terms more consistent and accurate. There is no ‘second’ region in the model. We hope this better illustrates it now. We also edited the legend to be clearer. Because the model description in Fig. 4d suffered from similar shortcomings, we also modified it accordingly as well as the figure legend there.

      Fig. 5b: why is the best fit at a proximal/distal ratio of 1, yet sigma is 50 um?

      Proximal/distal bias on this figure was fitted to 0.985 (prox/distal ratio) as we modeled control conditions, with intact NDMA and HCN channels,  which closely approximated the control recording comparisons.

      Fig. 6h, Line 662: "vs CsMeSO4 ... for putative LGN events" The panel shows proximal vs distal, not control vs Cs+. What's going on here?

      Typo corrected.

      Fig. 7e: the ctrl sag ratio here averages 0.02, while in Fig. S1 the average (for V1 and others) is about 0.07.  Please refer to our answer given to the previous question regarding sag ratio measurements. Briefly, recordings made with 5-CT application were made using a less severe, -2 pA/pF current injection to test seg responses. This more modest hyperpolarization activated less HCN channels, therefore the sag ratio is lower compared to previously reported datapoints.

      We have included this explanation in the methods section (page 14)

      Now hear you are using a paired test for this pharmacology, but you didn't previously (see my earlier comments/questions).

      Paired t-test were used for these experiments as these control and test datapoints came from the same cell. Cells were recorded in control conditions, and after drug application.

      Line 137: single-axon activation: but cortical axons make multi-synaptic contacts, at least for certain types of pre- and post-synaptic neurons, and (e.g. in L5-L5 pairs) those contacts can be distributed across the entire dendritic arbor. In other words, it's possible that when you stimulate in L1, you activate local axons, and the signal could then propagate to multiple synaptic contact locations, some being distal and some proximal. Maybe you have reasons to believe you're able to avoid this?

      We thank the reviewer for this question. Cortical axons often make distributed contacts, however, top-down and bottom-up pathways innervating L2/3 PCs are at least somewhat restricted to L2/3/L4 and L1, respectively (Shen et al. 2022, Sermet et al. 2019). Therefore, due to the lack evidence suggesting a heavily mixed topographical distribution for top-down and bottom-up inputs, we have reason to believe that L1 stimulation will result in mainly distal input recruitment, while L4 stimulation will mainly excite proximal dendritic regions. The resolution of our experiments was also improved by the minimal stimulation and visual guidance (subset of experiments) of the stimulation. Furthermore, new optogenetic experiments stimulating LGN and LM axons, which have been anatomically defined previously as biased to deeper layers and L1, respectively, were now also performed (Fig. 3j-l) with analogous cesium effects as our local electrical stimulation experiments. Future work using varying optogenetic stimulation parameters will expand on this.

      L140: "previous reports" ==> citation needed.

      We have inserted the citation needed.

      L149: "arriving to layer 1"; but I think earlier you noted that some or many L2/3 neurons lack a dendritic tuft; do they all nevertheless have dendrites in L1? Note that cortico-cortical long-range axons still need to pass through all cortical layers on their way up to L1.

      We thank the reviewer for the question. Although the more superficial L2/3 PCs lack distinct apical tuft, their dendrites reach the pia similarly to deeper L2/3 PCs. All of our recorded and post-hoc recovered cells had dendrites in L1, except in cases where they were clearly cut during the slicing procedure, which cells were occluded from the study.

      When you write "L4 axons" or "L4 inputs", do you specifically mean long-range thalamic axons? Or axons from local L4 neurons? What about axons in L4 that originate from L5 pyramidal neurons?

      In case of ‘L4’ axons, we cannot disambiguate these inputs a priori, as they are both part of the bottom-up pathway, and are possibly experimentally indistinguishable. Even with restricted opto LGN stimulation, disynaptic inputs via L4 PCs cannot be completely ruled out under our conditions. On the other hand, the probability of L5 PC axons to terminate on L2/3 PCs is exceedingly low (single reported connection out of 1145 potential connections; Hage et al. 2022). We did find two clearly different synaptic subpopulations (Supp. Fig 3) in L4- which was tempting to classify as one or the other. However we felt there was not enough evidence in the literature as well as our additional optogenetic experiments to make a classification on the source of these different L4 inputs. Thus we deemed them as Type-I or Type-II for now.

      Do you inject more holding current to compensate for the resting membrane potential when Cs+ or ZD7288 is in the bath?

      We thank the reviewer for the question. We did not inject a compensatory current, as we wanted to investigate the dual, physiologically relevant action of HCN channels (George et al. 2009)

      I'd like to see distributions (histograms) of L4 and L1 EPSP amplitudes, under control conditions and ideally also under HCN block.

      We have now extended the manuscript with a supplementary figure (Supplementary Figure 6) to show that EPSP peak was not distance dependent in control conditions, and there was no relationship between peak and halfwidth in our dataset.

      Line 186, custom pipette perfusion: why not use this for internal ZD7288, to make it cell-specific?

      We thank the reviewer for the question, this is a good point. In future work we will consider this when applicable. It is certainly a way to control for bath application confounds in many ways.

      L205: "recapitulate our experimental findings" - which findings do you mean? I think a bit of explanation/referencing would help.

      Corrected.

      Line 210: L4-evoked were narrower than L1-evoked: is this not expected based on filtering?

      We thank the reviewer for pointing this out, the word “Intriguingly” has been omitted.

      Line 231 and 235: "in L5 PCs" should be restricted to L5 PT-type PCs.

      We have corrected this throughout the manuscript.

      Neuromodulation, Fig. 7, L263-282: the neuromodulation finding is interesting. However, a bit like the developmental figure, it feels "tacked on" and the transition feels a bit awkward. I think you may want to discuss/cite more of the existing literature on neuromodulatory interactions with HCN (not just L2/3). Most importantly, what I feel is missing is a connection to your main finding, namely L1 and L4 inputs. Does serotonergic neuromodulation put L1 and L4 back on equal footing, or does it exaggerate the differences?

      We thank the reviewer for the question. We agree with the reviewer that Figure 7 does not give a complete picture about how the adult brain can capitalize on this channel distribution, as our intention was to show that HCN channels are not a stationary feature of L2/3 PC, but a feature which can be regulated developmentally and even in the adult brain via neuromodulation. In other words, the subthreshold NMDA boosting we observed can be gated by HCN, depending on developmental stage and/or neuromodulatory state of the system. We have now added some brief language to better introduce the transition and its relevance to the current study in the results (p8), and discussed the implications in the discussion section of the original manuscript.

      General comment: different types/sources of synapses may have different EPSP kinetics. I feel this is not mentioned/discussed adequately, considering your emphasis on EPSPs/HCN.

      See points above on input-specific synaptic diversity.

      Line 319/320: enriched distal HCN is found in L5 PT-type, not in all L5 PCs.

      Corrected

      L320: CA1 reportedly has a subset of pyramidal neurons that have higher proximal HCN than distal (I gave the citation above). In light of that, I think "unprecedented" is an overstatement.

      Corrected.

      Methods:

      L367: What form of anesthesia was used?

      Amended.

      Which brain areas, and how?

      Amended.

      Why did you first hold slices at 34C, but during recording hold at 30C?

      We held the slices at 34C to accelerate the degradation of superficial damaged parts of the slice, which is in line with currently used acute slice preparation methodologies, regardless of the subsequent recording temperature.

      Pipette resistance/tip size?

      Amended.

      Cell-attached recordings (L385): provide details of recordings. What was the command potential (fixed value, or did you adjust it per neuron by some criteria)?

      Amended.

      What type of stimulating electrode did you use? If glass, what solution is inside, and what tip size?

      We thank the reviewer for pointing these out, the specific points were added to the methods section.

      L392/393: you adjusted the holding (bias) current to sit at -80 mV. What were the range and max values of holding current? Was -80 mV the "raw" potential, or did it account for liquid junction? If you did not account for liquid junction potential, then would -80 in your hands effectively be between -95 and -90 mV? That seems unusually hyperpolarized.

      All cells were held with bias holding currents between -50 pA and 150 pA. To be clear, as mentioned below, we did not change the bias current after any drug applications. We did not correct for liquid junction potential, and cells were ‘held‘ with bias current at -80 mV as during our recordings, as 1) this value was apparently close to the RMP (i.e. little bias current needed at this voltage on average) (Fig. 2e) and 2) to keep consistent conditions across recordings. The uncorrected -80 mV is in the range of previously reported membrane potential values both in vivo and in vitro (Svoboda et al. 1999, Oswald et al. 2008, Luo et al. 2017), which found the (corrected) RMP to be below -80mV. Naturally this will not reflect every in vivo condition completely and further investigation using naturalistic conditions in the future are warranted.  

      Did you adjust the bias current during/after pharmacology?

      Bias current was not adjusted in order to resolve the effect on resting membrane potential.

      L398: sag calculation could use better explanation: how did you combine/analyze multiple steps from a single neuron when calculating sag? Did you choose one level (how) or did you average across step sizes or ...?

      Sag ratio was measured at -6 pA/pF current step except for one set of experiments in Fig. 7. Methods section was amended.

      L400, 401: 10 uM Alexa-594 or 30 um Alexa-594, which is correct?

      10 µM is correct, typo was corrected

      L445: "PV cell" seems like a typo?

      Typo is corrected.

      L450: "altered", please describe the algorithm or manual process.

      Alterations were made manually.

      L474: NDMA, typo.

      Typo is fixed.

      L474: "were adjusted", again please describe the process.

      Adjustments were made by a grid-search algorithm.

      Biel, M., Wahl-Schott, C., Michalakis, S., & Zong, X. (2009). Hyperpolarization-activated cation channels: from genes to function. Physiological reviews, 89(3), 847-885. https://journals.physiology.org/doi/full/10.1152/physrev.00029.2008 - (very comprehensive review of HCN)

      Bullis JB, Jones TD, Poolos NP. Reversed somatodendritic I(h) gradient in a class of rat hippocampal neurons with pyramidal morphology. J Physiol. 2007 Mar 1;579(Pt 2):431-43. doi: 10.1113/jphysiol.2006.123836. Epub 2006 Dec 21. PMID: 17185334; PMCID: PMC2075407. https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/jphysiol.2006.123836 - (CA1 subset (PLPs) have a reversed HCN gradient; cell-attached patches, NMDAR)

      Velumian AA, Zhang L, Pennefather P, Carlen PL. Reversible inhibition of IK, IAHP, Ih, and ICa currents by internally applied gluconate in rat hippocampal pyramidal neurones. Pflugers Arch. 1997 Jan;433(3):343-50. doi: 10.1007/s004240050286. PMID: 9064651. https://link.springer.com/article/10.1007/s004240050286 - (K-Gluc internal inhibits HCN)

      Sheets, P. L., Suter, B. A., Kiritani, T., Chan, C. S., Surmeier, D. J., & Shepherd, G. M. (2011). Corticospinal-specific HCN expression in mouse motor cortex: I h-dependent synaptic integration as a candidate microcircuit mechanism involved in motor control. Journal of neurophysiology, 106(5), 2216-2231. https://journals.physiology.org/doi/full/10.1152/jn.00232.2011 - (L2/3 IT have same sag ratio as all other non-PT pyramidals, roughly 5% (vs 20% PT); intracellular ZD7288 used at 10 or 25 um)

      Harris NC, Constanti A. Mechanism of block by ZD 7288 of the hyperpolarization-activated inward rectifying current in guinea pig substantia nigra neurons in vitro. J Neurophysiol. 1995 Dec;74(6):2366-78. doi: 10.1152/jn.1995.74.6.2366. PMID: 8747199. https://journals.physiology.org/doi/abs/10.1152/jn.1995.74.6.2366 - (comparison Cs+ and ZD7288)

      Harris, N. C., Libri, V., & Constanti, A. (1994). Selective blockade of the hyperpolarization-activated cationic current (Ih) in guinea pig substantia nigra pars compacta neurones by a novel bradycardic agent, Zeneca ZM 227189. Neuroscience letters, 176(2), 221-225. https://www.sciencedirect.com/science/article/abs/pii/0304394094900876 - (Cs+ is not HCN-selective; it also broadens APs, reduces the AHP)

      Chevaleyre, V., & Castillo, P. E. (2002). Assessing the role of Ih channels in synaptic transmission and mossy fiber LTP. Proceedings of the National Academy of Sciences, 99(14), 9538-9543. https://pnas.org/doi/abs/10.1073/pnas.142213199 - (Cs+ blocks K channels, increases transmitter release; but also ZD7288 affects synaptic transmission)

      Thank you

    2. eLife Assessment

      In this valuable study the authors use electrophysiology in brain slices and computer modeling and suggest that layer 2/3 pyramidal neurons of the mouse cortex have functional HCN channels on the proximal apical dendrite which allows distinct processing of input at that location from the input to distal apical dendrites. The revisions improved the solid paper but some of the concerns were not addressed sufficiently and many of these concerns could be addressed by further revision.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Olah et al., uncovers a previously unknown role of HCN channels in shaping synaptic inputs to L2/3 cortical neurons. The authors demonstrate using slice electrophysiology and computational modeling that unlike layer 5 pyramidal neurons, L2/3 neurons have an enrichment of HCN channels in the proximal dendrites. This location provides a locus of neuromodulation for inputs onto the proximal dendrites from L4 without an influence on distal inputs from L1. the authors use pharmacology to demonstrate the effect of HCN channels on NMDA-mediated synaptic inputs from L4. The authors further demonstrate the developmental time course of HCN function in L2/3 pyramidal neurons. Taken together, this a well constructed investigation of HCN channel function and the consequences of these channels on synaptic integration in L2/3 pyramidal neurons.

      Strengths:

      The authors use careful, well-constrained experiments using multiple pharmacological agents to asses HCN channel contributions to synaptic integrations. The authors also use voltage-clamp to directly measure the current through HCN channels across developmental ages. The authors also provide supplemental data showing that their observation is consistent across multiple areas of the cerebral cortex.

      Weaknesses:

      The gradient of HCN channel function is based almost exclusively on changes in EPSP width measured at the soma. While providing strong evidence for the presence of HCN current in L2/3 neurons, there are space clamp issues related to the use of somatic whole-cell voltage clamp that should be considered in the discussion. One omission by the authors is related to cell morphology. They make a point of normalizing the current injections to cell capacitance to account for variability in neuronal morphology. It is not clear however, how, if at all, this variability would affect EPSP propagation and modulation by proximal HCN channels. This should at least be discussed. Also, if there is high variability in cell morphology, was this considered in the modeling experiments?

    4. Reviewer #3 (Public review):

      Summary:

      The authors study the function of HCN channels in L2/3 pyramidal neurons, employing somatic whole-cell recordings in acute slices of visual cortex in adult mice and a bevy of technically challenging techniques. Their primary claim is a non-uniform HCN distribution across the dendritic arbor with greater density closer to the soma (roughly opposite of the gradient found in L5 PT-type neurons). The second major claim is that multiple sources of long-range excitatory input (cortical and thalamic) are differentially affected by the HCN distribution. They further describe an interesting interplay of NMDAR and HCN, serotonergic modulation of HCN, and compare HCN-related properties at 1-, 2- and 6-weeks of age. Several results are accompanied by biophysical simulations.

      Strengths:

      The authors collected data from both male and female mice, at an age (6-10 weeks) that permits comparison with in vivo studies, in sufficient numbers for each condition, and they collected a good number of data points for almost all figure panels. This is all the more positive, considering the demanding nature of multi-electrode recording configurations and pipette-perfusion. The main strength of the study is the question and focus.

      Weaknesses:

      Unfortunately, in its present form, the main claims are not adequately supported by the experimental evidence: primarily because the evidence is indirect and circumstantial, but also because multiple unusual experimental choices (along with poor presentation of results) undermine the reader's confidence. Additionally, the authors overstate the novelty of certain results and fail to cite important related publications. Some of these weaknesses can be addressed by improved analysis, statistics, resolving inconsistent data across figures, reorganizing/improving figure panels, more complete methods, improved citations, and proofreading. In particular, given the emphasis on EPSPs, the primary data (example EPSPs, overlaid conditions) should be shown much more.

      However on the experimental side, addressing the reviewer's concerns would require a very substantial additional effort: direct measurement of HCN density at different points in the dendritic arbor and soma; the internal solution chosen here (K-gluconate) is reported to inhibit HCN; bath-applied cesium at the concentrations used blocks multiple potassium channels, i.e. is not selective for HCN (the authors have concerns about using the more selective blocker ZD7288, but did use it in a subset of experiments, some of which show quantitatively different results). In response to initial review, the authors performed pathway-specific synaptic stimulation, via optogenetic activation of specific long-range inputs - this approach is valuable and interesting, however the results are presented very minimally and only partially match those obtained by layer-specific electrical stimulation.

    1. eLife assessment

      This important and detailed study presents the most comprehensive view of the functional organization and requirements for a mother centriole's distal appendage in primary cilia assembly published to date. Crispr-knockouts and super-resolution microscopy analysis of the distal appendage proteins provides convincing evidence to support the claims of the authors. This work will be of high value to cell biologists and biophysicists working on the structure and function of the centrosome as well as human geneticists exploring ciliary pathology.

    2. Reviewer #1 (Public Review):

      In this work, Kanie and colleagues explored the composition, structure, and assembly hierarchy of distal appendage proteins. The microscopy was well executed and appropriately quantified. Importantly, the quality of individual antibodies was documented with a discussion of how this might complicate results. The hierarchy of assembly was established by careful quantification of assembly in an extensive set of knockout cell lines. This work will be of interest to cell biologists exploring organelle assembly as well as human geneticists trying to understand the clinical implications of mutations.

    3. Reviewer #2 (Public Review):

      Kanie et all have carried out a tour-de-force effort to further understand the hierarchy and function of centriole distal appendages in ciliogenesis. They made a thorough effort to understand the localization of all the known distal appendage proteins. To examine the distal appendage hierarchy, they used an automated analysis of centrosomal localization. It is not clear how this was quantified and pictures are not shown. They used CEP170, a marker for subdistal appendages, to define a mask around centrioles. It is not clear how the experiment was analyzed and normalized. The techniques used in this study cannot be compared with those commonly used in the field which normally include storm and other super-resolution techniques (which are less prone to artifacts) and correlated electron microscopy. Thus, it is not possible to make a head-to-head comparison. The lack of rescue experiments further weakens the conclusions of this paper.

    4. Reviewer #3 (Public Review):

      Distal appendages are multiprotein complexes that are only present on the mother centriole as a 9-fold symmetric structure that functions in ciliogenesis. How distal appendage proteins are organized and assembled still remains poorly understood. In this manuscript, Kanie et al. comprehensively analyzed the localizations of known and newly described distal appendage proteins using super-resolution microscopy. They investigated mechanisms associated with distal appendage assembly and their roles in the early stages of ciliogenesis in CRISPR-Cas9 knockout cells, which enabled a clearer investigation of these structures compared to previous RNAi depletion studies. These studies confirm previous findings for distal appendage protein ciliogenesis function and demonstrate the CEP83-SCLT1-CEP164-TTBK2 module is critical for both distal appendage assembly and the initiation of ciliogenesis. Notably, they find that CEP89 is dispensable for distal appendage assembly, but is needed for the recruitment of RAB34-positive ciliary vesicles to the mother centriole for ciliogenesis. Finally, this work introduces the application of single-molecule 3D super-resolution microscopy as a tool for interrogating the relationship between membranes and distal appendages. Overall this work extends our fundamental understanding of distal appendage structure/function in ciliogenesis.

      An interesting observation from this work is that CEP83 is found localized both at the innermost region and the outermost region of the distal appendages when detected by antibodies that recognize a different epitope of CEP83 (Figure 1A), suggesting a helical structure that could serve as a platform for distal appendage assembly. A previous study using STORM imaging also showed that another distal appendage protein CEP164 occupies a wider region of the distal appendages when using an antibody recognizing the N-terminal residues of Cep164 (M Bowler et al. 2019). Together these studies show the importance of evaluating the structure of distal appendage proteins and the challenges of using antibody detection to reveal distal appendage hierarchy.

      This work also highlights the potential differences in functional conclusions that can be drawn when comparing RNAi and CRISPR knockout depletion approaches. The latter which expectedly can lead to a more precise functional analysis of these small distal appendage structures, albeit with the potential for knockout cells to display compensatory regulation. Although not directly addressed in the text, the authors find that RPE-1 MYO5A knockout cells could ciliate which differs from a report by Wu et al. (2018). Furthermore, in the case of RAB34 knockout cells, the authors find CP110 removal from the mother centriole, while in previously published RAB34 KO studies this was not observed. In the case of the. RAB34 data a plausible explanation for the results given by the authors is that different assay conditions were used as was noted by the authors.

    1. eLife Assessment

      This important paper explores the impact of early life stress (ELS) on adult brain and behavior. The significance of the convincing findings are that they implicate regulation of non-neuronal cells in the development of brain and behavioral dysfunction associated with ELS. With an elegant combination of behavioral models, morphological and functional assessments using immunostaining, electrophysiology, and viral-mediated loss-of-function approaches, the authors report that astrocyte dysfunction plays a role in ELS responses. The work is of interest to a broad behavioral and cellular neuroscience audience.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript asks the question of whether astrocytes contribute to behavioral deficits triggered by early life stress. This question is tested by experiments that monitor the effects of early life stress on anxiety-like behaviors, long-term potentiation in the lateral amygdala, and immunohistochemistry of astrocyte-specific (GFAP, Cx43, GLT-1) and general activity (c-Fos ) markers. Secondarily, astrocyte activity in the lateral amygdala is impaired by viruses that suppress gap-junction coupling or reduce astrocyte Ca2+ followed by behavioral, synaptic plasticity, and c-Fos staining. Early life stress is found to reduce expression of GFAP, Cx43 and induce translocation of the glucocorticoid receptor to astrocytic nuclei. Both early life stress and astrocyte manipulations are found to result in generalization of fear to neutral auditory cues. All of the experiments are done well with appropriate statistics and control groups. The manuscript is very well-written and the data are presented clearly. The authors' conclusion that lateral amygdala astrocytes regulate amygdala-dependent behaviors is strongly supported by the data as is the conclusion that cellular and behavioral outcomes provoked by early life stress are similar to the outcomes provoked by astrocyte dysfunction. However, the extent to which early life stress requires astrocytes to generate these outcomes remains open to debate.

      Strengths:

      A strong combination of behavioral, electrophysiology, and immunostaining approaches is utilized and possible sex-differences in behavioral data are considered. The experiments clearly demonstrate that disruption of astrocyte networks or reduction of astrocyte Ca2+ provoke generalization of fear and impair long-term potentiation in lateral amygdala. The provocative finding that astrocyte dysfunction accounts for a subset of behavioral effects of early life stress (e.g. not elevated plus or distance traveled observations) is also perceived as a strength.

      Weaknesses:

      The main weakness is absence of direct evidence that behavioral and neuronal plasticity after early life stress can be attributed to astrocytes. It remains unknown what would happen if astrocyte activity were disrupted concurrently with early life stress or if changes in astrocyte Ca2+ could attenuate early life stress outcomes. As is, the only presented evidence that early life stress involves astrocytes is nuclear translocation of GR and downregulation of GFAP and Cx43 in Figure 3 which may or may not cause the reported astrocyte activity changes.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Guayasamin et al. show that early-life stress (ELS) can induce a shift in fear generalisation in mice. They took advantage of a fear conditioning paradigm followed by a discrimination test and complement learning and memory findings with measurements for anxiety-like behaviors. Next, astrocytic dysfunction in the lateral amygdala was investigated at the cellular level by combining staining for c-Fos with astrocyte-related proteins. Changes in excitatory neurotransmission were observed in acute brains slices after ELS suggesting impaired communication between neurons and astrocytes. To confirm causality of astrocytic-neuronal dysfunction in behavioral changes, viral manipulations were performed in unstressed mice. Occlusion of functional coupling with a dominant negative construct for gap junction connexin 43 or reduction in astrocytic calcium with CalEx mimicked the behavioral changes observed after ELS suggesting that dysfunction of the astrocytic network underlies ELS-induced memory impairments.

      Strengths:

      Overall, this well written manuscript highlights a key role for astrocytes in regulating stress-induced behavioral and synaptic deficits in the lateral amygdala in the context of ELS. Results are innovative, and methodological approaches relevant to decipher the role of astrocytes in behaviors. As mentioned by the authors, non-neuronal cells are receiving increasing attention in the neuroscience, stress and psychiatry fields.

      Weaknesses:

      I did have several suggestions and comments that were addressed during the review process. I believe that it improved clarity and will increase the impact of the work.

    4. Reviewer #3 (Public review):

      Summary:

      The authors show that ELS induces a number of brain and behavioral changes in the adult lateral amygdala. These changes include enduring astrocytic dysfunction, and inducing astrocytic dysfunction via genetic interventions is sufficient to phenocopy the behavioral and neural phenotypes suggesting astrocyte dysfunction may play a causal role in ELS-associated pathologies.

      Strengths:

      A strength is the shift in focus to astrocytes to understand how ELS alters adult behavior.

      Weaknesses:

      The mechanistic links between some of the correlates - altered astrocytic function, changes in neural excitability and synaptic plasticity in the lateral amygdala and behavior - are underdeveloped.

      Comments on revisions:

      The authors have significantly improved the paper with the addition of new experimental data, analyses, and textual changes.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Early-life adversity or stress can enhance stress susceptibility by causing changes in emotion, cognition, and reward-seeking behaviors. This important manuscript highlights the involvement of lateral amygdala astrocytes in fear generalization and the associated synaptic plasticity, which are parallel to the effects of early life stress. With an elegant combination of behavioral models, morphological and functional assessments using immunostaining, electrophysiology, and viral-mediated loss-of-function approaches, the authors provide solid correlational and causal evidence that is consistent with the hypothesis that early life stress produces neural and behavioral dysfunction via perturbing lateral amygdala astrocytic function.

      We would like to thank the authors and editors for taking the time to review our work, and re-review it now. Also, we are grateful for this very positive assessment of our work. In this revised manuscript we made a strong effort to address comments made by all reviewers, providing clarification where required and new data to our manuscript in order to further support our observations.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript asks the question of whether astrocytes contribute to behavioral deficits triggered by early life stress. This question is tested by experiments that monitor the effects of early life stress on anxiety-like behaviors, long-term potentiation in the lateral amygdala, and immunohistochemistry of astrocyte-specific (GFAP, Cx43, GLT-1) and general activity (c-Fos ) markers. Secondarily, astrocyte activity in the lateral amygdala is impaired by viruses that suppress gap-junction coupling or reduce astrocyte Ca2+ followed by behavioral, synaptic plasticity, and c-Fos staining. Early life stress is found to reduce the expression of GFAP and Cx43 and to induce translocation of the glucocorticoid receptor to astrocytic nuclei. Both early life stress and astrocyte manipulations are found to result in the generalization of fear to neutral auditory cues. All of the experiments are done well with appropriate statistics and control groups. The manuscript is very well-written and the data are presented clearly. The authors' conclusion that lateral amygdala astrocytes regulate amygdala-dependent behaviors is strongly supported by the data. However, the extent to which astrocytes contribute to behavioral and neuronal consequences of early life stress remains open to debate.

      Strengths:

      A strong combination of behavioral, electrophysiology, and immunostaining approaches is utilized and possible sex differences in behavioral data are considered. The experiments clearly demonstrate that disruption of astrocyte networks or reduction of astrocyte Ca2+ provokes generalization of fear and impairs long-term potentiation in the lateral amygdala. The provocative finding that astrocyte dysfunction accounts for a subset of behavioral effects of early life stress (e.g. not elevated plus or distance traveled observations) is also perceived as a strength.

      Weaknesses:

      The main weakness is the absence of more direct evidence that behavioral and neuronal plasticity after early life stress can be attributed to astrocytes. It remains unknown what would happen if astrocyte activity were disrupted concurrently with early life stress or if the facilitation of astrocyte Ca2+ would attenuate early life stress outcomes. As is, the only evidence that early life stress involves astrocytes is nuclear translocation of GR and downregulation of GFAP and Cx43 in Figure 3 which may or may not provoke astrocyte Ca2+ or astrocyte network activity changes.

      We would like to thank the reviewer for their constructive feedback on our work. In the revised version we have added new experiments that further support a role of astrocytes in ELS-induced behavioural dysfunction. Specifically, we carried out two-photon calcium imaging in lateral amygdala astrocytes using viral overexpression of membrane tethered GCaMP6f. These experiments revealed a decrease in astrocyte calcium activity following ELS (Figure 4). Interestingly these data also showed an important number of sex differences (Figure 4 - Figure supplement 1).

      These new data allow us to strengthen the link between ELS-induced astrocyte hypofunction and behavioural changes. Indeed, we validated the impact of CalEx on astrocyte calcium activity in the lateral amygdala, again using two-photon microscopy, and show that CalEx resulted in an astrocyte calcium signature that very closely resembled that of ELS, i.e. reduced frequency and amplitude of events (Figure 5 - Figure supplement 2). As such, we feel like these data, while still correlative in nature, strengthen our findings and conclusion that astrocyte dysfunction alone is sufficient to recapitulate the effects of stress on excitability, synaptic function, and behaviour.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Guayasamin et al. show that early-life stress (ELS) can induce a shift in fear generalisation in mice. They took advantage of a fear conditioning paradigm followed by a discrimination test and complemented learning and memory findings with measurements for anxiety-like behaviors. Next, astrocytic dysfunction in the lateral amygdala was investigated at the cellular level by combining staining for c-Fos with astrocyte-related proteins. Changes in excitatory neurotransmission were observed in acute brains slices after ELS suggesting impaired communication between neurons and astrocytes. To confirm the causality of astrocytic-neuronal dysfunction in behavioral changes, viral manipulations were performed in unstressed mice. Occlusion of functional coupling with a dominant negative construct for gap junction connexin 43 or reduction in astrocytic calcium with CalEx mimicked the behavioral changes observed after ELS suggesting that dysfunction of the astrocytic network underlies ELS-induced memory impairments.

      Strengths:

      Overall, this well-written manuscript highlights a key role for astrocytes in regulating stress-induced behavioral and synaptic deficits in the lateral amygdala in the context of ELS. Results are innovative, and methodological approaches relevant to decipher the role of astrocytes in behaviors. As mentioned by the authors, non-neuronal cells are receiving increasing attention in the neuroscience, stress, and psychiatry fields.

      Weaknesses:

      I do have several suggestions and comments to address that I believe will improve the clarity and impact of the work. For example, there is currently a lack of information on the timeline for behavioral experiments, tissue collection, etc.

      We thank the reviewer for their kind comments and constructive feedback on our manuscript. We agree that certain aspects could have been made more clear and we have revised the manuscript and figures to be more explicit regarding timelines. Including the addition of timelines on figures and improved clarity in the text where possible. We have also addressed the private comments provided by the reviewers alluded to in this public review.

      Reviewer #3 (Public Review):

      Summary

      The authors show that ELS induces a number of brain and behavioral changes in the adult lateral amygdala. These changes include enduring astrocytic dysfunction, and inducing astrocytic dysfunction via genetic interventions is sufficient to phenocopy the behavioral and neural phenotypes. This suggests that astrocyte dysfunction may play a causal role in ELS-associated pathologies.

      Strengths:

      A strength is the shift in focus to astrocytes to understand how ELS alters adult behavior.

      Weaknesses:

      The mechanistic links between some of the correlates - altered astrocytic function, changes in neural excitability, and synaptic plasticity in the lateral amygdala and behaviour - are underdeveloped.

      We thank the reviewer for their comments. We are happy that they found our shift in focus towards astrocytes to be a strength of our work. Regarding mechanistic links being underdeveloped, we have attempted to address this by placing more effort into understanding the functional changes in astrocytes and how this relates to behaviour.

      To address this comment we have used two-photon calcium imaging to quantify the impact of ELS on astrocyte calcium activity. As such, the revised manuscript contains several new figures including a detailed characterisation of the effects of ELS on astrocyte calcium activity (Figure 4), including sex differences in naive and the effects of stress (Figure 4 - Figure supplement 1), and an important validation of the impact of CalEx on astrocyte calcium activity. CalEx mirrors the impact of stress on astrocyte calcium activity reducing the frequency and amplitude of individual events (Figure 5 - Figure supplement 2).

      Considering the strong overlap of the effects of ELS and CalEx on synapses, excitability, behaviour, and now astrocyte calcium activity, we hope that this added detail addresses some of the points highlighted by the reviewer.

      Recommendations for the authors:

      The reviewers all agree on one major issue for the authors to address. There is a bit of a lack of mechanistic linking between the astrocyte function and the early life stress and these data are more correlational than causal in nature. This could either be addressed by scaling back the data interpretation and title to be more reflective of the data at hand or if the authors would consider, doing the causal experiment of examining the manipulation of astrocyte activity following early life stress to see if this does influence the phenotype.

      We agree with reviewers on this issue and realise that we have overstated our findings somewhat. As an immediate fix, suggested by reviewers, we have changed the title to more closely align with our data stating that astrocyte dysfunction is “associated with” rather than “induces” as well as adjusting our interpretations.

      In addition to this one major comment, there are a list of minor comments that the authors should consider to improve the manuscript.

      (1) A major caveat is the lack of information on the timeline for behavioral experiments, tissue collection, etc. The authors mention "Mice between ages P45-70' but considering the developmental changes occurring between late adolescence and young adulthood, I recommend adding timelines on all Figures clearly indicating when behavioral tests were performed, and tissue collected for electrophysiology or immunostaining. With corticosterone (CORT) back at baseline at P70 vs a difference observed at P45 was this time point favored? It should be clarified throughout.

      We apologise for the lack of clarity on this and have added more timelines on figures.

      The age range favoured (p45-p70), relates to adolescence a time when latent psychiatric disorders tend to manifest in humans following early-life adversity. We have clarified this choice in the text.

      (2) Given the transient increase in corticosterone levels in early-life stress mice, peaking at P45 and declining to control levels by P70, it would be informative to know whether the reported behavioral and synaptic changes differ within this time window. This may not be doable in the current approach, but this should be addressed nonetheless. Furthermore, it wasn't clear why the increase in blood corticosterone was delayed. Was this expected? How does this relate to earlier work? Wouldn't it be expected to be elevated at P17 (end of ELS period)?

      We agree that this observation was very unexpected. Initially, we expected CORT to be elevated at P17, end of ELS period. We believe that low CORT levels during the ELS paradigm can be attributed to this paradigm coinciding with the stress hyporesponsive period (SHRP) which in rodents lasts until roughly postnatal day 14. During this period, mild stressors fail to elicit CORT responses. Considering our ELS paradigm lasts from P10-P17, there is a significant overlap with the SHRP.

      This point is now included in the discussion with several citations regarding this biological phenomenon, as well as other studies that report similar findings to our own, i.e. a delayed increase in blood corticosterone levels following early-life stress.

      (3) It is mentioned that behavioral tests were performed in both sexes with no sex differences observed. Were animals of both sexes also included in other experiments (ephys, immunostaining, blood CORT analysis)? Behavioral outcomes could be the same but underlying biological processes different. This is a topic that should be discussed. Identification of males vs females on graphs would be helpful.

      We apologise for not having provided this data in the previous version of the manuscript. In the revised manuscript we provide analysis of sex differences for our initial behavioural observations (Figure 2 - Figure supplement 1), c-Fos (Figure 2 - Figure supplement 2), for GFAP and Cx43 (Figure 3 - Figure supplement 1), calcium signalling (Figure 4 - Figure supplement 1), and for CalEx and dnCx43 experiments across behaviour (Figure 5 - Figure supplement 4) and c-Fos (Figure 5 - Figure supplement 5).

      (4) How long-lasting are the generalization phenotypes? Do they outlast the transient increase in blood corticosterone? Showing this would provide a more solid foundation for future explorations.

      The reviewers raise a very important point. It remains unclear as to how long these effects last and this is something we are keen to address in future studies, with careful experiments designed to explicitly test this question, as well as subsequent questions regarding whether long-lasting effects are due to impaired brain development or whether these effects emerge due to CORT changes, or other changes, or a combination of them all?

      As an aside, an additional manuscript from our lab (Depaauw-Holt et al. 2024 bioRxiv) which uses the same stressor but focuses on distinct brain regions and behaviours uses a prolonged time window in which the effects of stress are readily observable all the way to P90.

      So while we do provide the answers in this work, it is a really great idea that we would like to follow up subsequently.

      (5) With the ELS-induced change in locomotion, I would recommend presenting open field (center, periphery) and elevated plus maze (open, closed arms) data independently. It could also be interesting to analyze corner time in the open field as well as center time in the elevated plus maze.

      We now provide data for the open field and elevated plus maze as requested. Our findings remain unchanged, but we agree with the reviewer that this way of representing the data is more clear.

      (6) For Figure 2C, the ideal stats would be an ANOVA with CS (+/-) as a within-subject variable and treatment (naive/ELS) as a between-subjects variable. Then the best support for the generalization claim would be a CS x treatment interaction. I encourage the authors to do these stats. I note that this point is mitigated by the discrimination analysis presented in 2D (where they compare naive and ELS groups directly).

      We have carried out the analysis as requested and these data further support the notion of fear generalisation in ELS mice (Figure 2 - Figure supplement 2A, B). Additionally, the analyses are included in a supplementary table. We hope that we have understood correctly, and this figure accurately reflects the reviewer’s suggestion.

      (7) In Figure 2H, why not evaluate c-Fos levels after the discrimination test which is the main behavioral outcome? This statement in the Discussion should be modified if, as per my understanding, c-Fos was measured after the fear paradigm only "We find that both ELS and astrocyte dysfunction both enhance neuronal excitability, assessed by local c-Fos staining in the lateral amygdala following auditory discriminative fear conditioning. One interpretation of these data is that astrocytes might tune engram formation, with astrocyte dysfunction, genetically or after stress, increasing c-Fos expression resulting in a loss of specificity of the memory trace and generalisation of fear.'

      We agree that further evaluation of c-Fos levels following the discrimination test would be insightful. We honestly did not consider this time point in our initial experimental design, as we considered previous reports in the literature that investigated how the numbers of cells recruited to the engram (c-Fos density) could influence memory accuracy at a later time point. As such, investigating c-Fos levels following training was our initial target. We have modified the text to be more explicit in our experimental approach.

      This is nevertheless a fascinating point that we are keen to pursue in future studies.

      (8) Some thoughts on why dnCx43 suppression of astrocyte network activity is less effective at inducing fear generalization than CalEx suppression of astrocyte Ca2+ are warranted. One might predict that both manipulations should result in similar effects, as seen in fEPSP and cFos data in Figure 4.

      We agree that this is an interesting observation and the fact we did not observe the same behavioural phenotype despite fEPSP and c-Fos data to be the same is puzzling.

      Nevertheless, we do see increased fear generalisation in both dnCx43 and CalEx. We hypothesise that CalEx had a more profound effect due to the wide range of processes that are presumably affected by reduced astrocyte calcium activity, whereas blocking gap junction channels still leaves a large number of astrocyte functions intact.

      Overall, our conclusion is that behaviour is a more sensitive assay compared to the cellular phenotypes, which highlights the importance of answering these questions from multiple angles.

      (9) Ideally changes in functional coupling following the dnCx43 manipulation) should be shown here (line 169).

      We, unfortunately, did not directly evaluate functional coupling in dnCx43 mice in this manuscript. This would have been a useful experiment, but we rely on our previous data where we extensively characterised this tool (Murphy-Royal et al. 2020 Nat Comms).

      (10) It would be relevant to perform c-Fos staining with markers for astrocytes or neuronal cells. Is an increase in activity expected for both cell types?

      This is a fascinating question, given recent work on this topic showing that astrocytes can indeed express c-Fos and may be recruited into engrams. We analysed our existing tissue, we found that indeed astrocytes were labelled with c-Fos following our behavioural conditioning paradigm. Our data align with recent reports, and we demonstrate a small percentage of astrocytes expressing c-Fos (Figure 2 - figure supplement 3). This modest number of astrocytes expressing c-Fos is discussed in the text and placed into context of very recent papers that have been published since our submission to eLife.

      (11) Were the same mice subjected to behavior analysis than immunostaining?

      We generated separate cohorts of mice for immunostaining and behaviour, and have made this more clear in the text.

      (12) Language describing learning paradigm. The CS+ (line 73) isn't in itself aversive (and shouldn't be described as such). It acquires that value after pairing with the US (which is aversive).

      We agree that this is poorly worded and have modified the text from “aversive cue” to “conditioned cue”.

      (13) It is hard to appreciate the glucocorticoid receptor translocation with the images provided. Would it be possible to increase magnification or at least, provide small inserts at higher magnification?

      We have re-imaged our brain sections to get more detailed images. These are provided in revised manuscript (Figure 3)

      (14) For the viral injection experiment, for how long is the virus expressed before running behavior/recording/c-Fos staining? Is the age of the tested mice the same as Figures1-3 or they were injected at P45 and tested weeks later?

      We age-matched all mice for all experiments and tried to keep our experimental window as tight as possible (p45-70). All mice were injected at P25-30 in order to meet the experimental time window. To be more precise we have added timelines on all figures.

      (15) A validation of the virus is missing to confirm the reduction of Cx43 expression at mRNA and protein levels when compared to controls. A reference is provided but to my understanding age of the animals might be different.

      Here, I believe the reviewer is referring to dnCx43. In this experiment we used a viral approach to overexpress a non-functional connexin 43 protein (Murphy-Royal et al. 2020 Nat Comms). As such, a PCR or immuno against this protein would be expected to reveal higher expression levels. We have tried to clarify this approach in the text.

      It is true that we did not fully characterise this tool in the lateral amygdala which would have been useful but considering our extensive experience with this tool and in it’s development with our collaborators Baljit Khakh, Randy Stout, David Spray (see Murphy-Royal et al. 2020) we are confident in these data, despite the limitation of validation in this manuscript.

      (16) Same comment for the CalEx, a validation would be appreciated. Based on Yu et al. could a GCaMP6f virus be more appropriate as control?

      We agree this is an important experiment as our lab has not fully validated this tool in house (compared to dnCx43, which we previously validated).

      Importantly, we now have the capacity to do these experiments. Until very recently our two-photon microscope was not fully functional due to dodgy PMTs sent from the company we purchased our equipment from… Troubleshooting this issue took many months before we were convinced that we were not at fault and that the problem was the equipment.

      As such, mice were injected with both a membrane tethered GCaMP6f under the control of the short GFAP promoter - AAV2/5-gfaABC1D-lck-GCaMP6f and CalEx - AAV2/5-gfaABC1D-hPMCA2w/b-mCherry. Using this approach we were able to record calcium activity from CalEx positive and CalEx negative astrocytes in the same tissue (Figure 5 - figure supplement 2).

      We report that this approach does indeed reduce astrocyte calcium but does not entirely eliminate it. In fact, CalEx expressing astrocytes displayed similar calcium activity dynamics to that we observed following ELS. Together, this further strengthens our rationale to use CalEx in order to mimic the effects of stress on astrocytes, and determine downstream effects on excitability, synapses, and behaviour.

      (17) Have previous studies found ELS--> generalization phenotypes in adulthood? If so, these should be discussed in more detail. If not, perhaps this point can be made more explicit.

      This is a great point. After looking deeper into the literature in more depth we found an example of this in which ELS resulted in context fear generalisation in adult rats. This work is cited in the discussion in the context of our findings.

      (18) A paper by Krugers et al (Biol Psychiatry 2020) seems especially relevant (glucocorticoids, fear generalization, engram size) and should be discussed.

      Thank you for bringing this work to our attention. This is certainly important work that we had unfortunately overlooked. We have added a citation and discussed the manuscript Lesuis et al. Biol. Psychiatry 2021, which contains the data discussed in the conference proceeding by Krugers et al. Biol. Psychiatry 2020.

      Additionally, we added another great manuscript by Lesuis et al. recently published in Cell in which they investigated the cellular mechanisms by which acute stress results in fear generalisation via endocannabinoids.

      (19) Minor text revisions are necessary at lines 101 and 264 as well as p.5, line 58: "ratio" and p.10, line 128: "region of interest".

      Thank you for pointing out these typos and errors. We have corrected them.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. eLife Assessment

      This important study combines the use of Fisher Kernels with Hidden Markov models aiming to improve brain-behaviour prediction. The evidence supporting the authors' conclusions is compelling, comparing brain-behaviour prediction accuracies across a range of different traits, including out of sample assessment. This work is timely and will be of interest to neuroscientists working on functional connectivity for brain-behaviour association.

    2. Reviewer #1 (Public review):

      Summary:

      The authors attempt to validate Fisher Kernels on the top of HMM as a way to better describe human brain dynamics at resting-state. The objective criterion was the better prediction of the proposed pipeline of the individual traits.

      Comments on revisions:

      The authors addressed adequately all my comments.

    3. Reviewer #3 (Public review):

      Summary:

      In this work, the authors use a Hidden Markov Model (HMM) to describe dynamic connectivity and amplitude patterns in fMRI data, and propose to integrate these features with the Fisher kernel to improve the prediction of individual traits. The approach is tested using a large sample of healthy young adults from the Human Connectome Project. The HMM-Fisher Kernel approach was shown to achieve higher prediction accuracy with lower variance on many individual traits compared to alternate kernels and measures of static connectivity. As an additional finding, the authors demonstrate that parameters of the HMM state matrix may be more informative in predicting behavioral/cognitive variables in this data compared to state-transition probabilities.

      Comments on revisions:

      The authors have now addressed my comments, and I believe this work will be an interesting contribution to the literature.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors attempt to validate Fisher Kernels on the top of HMM as a way to better describe human brain dynamics at resting state. The objective criterion was the better prediction of the proposed pipeline of the individual traits.

      Strengths:

      The authors analyzed rs-fMRI dataset from the HCP providing results also from other kernels.

      The authors also provided findings from simulation data.

      Weaknesses:

      (1) The authors should explain in detail how they applied cross-validation across the dataset for both optimization of parameters, and also for cross-validation of the models to predict individual traits.

      Indeed, there were details about the cross-validation for hyperparameter tuning and prediction missing. This problem was also raised by Reviewer #2. We have now rephrased this section in 4.4 and added details: ll. 804-813:

      “We used k-fold nested cross-validation (CV) to select and evaluate the models. We used 10 folds for both the outer loop (used to train and test the model) and the inner loop (used to select the optimal hyperparameters) such that 90% were used for training and 10% for testing. The optimal hyperparameters λ (and τ in the case of the Gaussian kernels) were selected using grid-search from the vectors λ=[0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1] and . In both the outer and the inner loop, we accounted for family structure in the HCP dataset so that subjects from the same family were never split across folds (Winkler et al., 2015). Within the CV, we regressed out sex and head motion confounds, i.e., we estimated the regression coefficients for the confounds on the training set and applied them to the test set (Snoek et al., 2019).“ and ll. 818-820: “We generated the 100 random repetitions of the 10 outer CV folds once, and then used them for training and prediction of all methods, so that all methods were fit to the same partitions.”

      (2) They discussed throughout the paper that their proposed (HMM+Fisher) kernel approach outperformed dynamic functional connectivity (dFC). However, they compared the proposed methodology with just static FC.

      We would like to clarify that the HMM is itself a method for estimating dynamic (or time-varying) FC, just like the sliding window approach, see also Vidaurre, 2024 (https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00363/124983) for an overview of terminology.

      See also our response to Q3.

      (3) If the authors wanted to claim that their methodology is better than dFC, then they have to demonstrate results based on dFC with the trivial sliding window approach.

      We would like to be clear that we do not claim in the manuscript that our method outperforms other dynamic functional connectivity (dFC) approaches, such as sliding window FC. We have now made changes to the manuscript to make this clearer.

      First, we have clarified our use of the term “brain dynamics” to signify “time-varying amplitude and functional connectivity patterns” in this context, as Reviewer #2 raised the point that the former term is ambiguous (ll.33-35: “One way of describing brain dynamics are state-space models, which allow capturing recurring patterns of activity and functional connectivity (FC) across the whole brain.”).

      Second, our focus is on our method being a way of using dFC for predictive modelling, since there currently is no widely accepted way of doing this. One reason why dFC is not usually considered in prediction studies is that it is mathematically not trivial how to use the parameters from estimators of dynamic FC for a prediction. This includes the sliding window approach. We do not aim at comparing across different dFC estimators in this paper. To make these points clearer, we have revised the introduction to now say:

      Ll. 39-50:

      “One reason why brain dynamics are not usually considered in this context pertains to their representation: They are represented using models of varying complexity that are estimated from modalities such as functional MRI or MEG. Although there exists a variety of methods for estimating time-varying or dynamic FC (Lurie et al., 2019), like the commonly used sliding-window approach, there is currently no widely accepted way of using them for prediction problems. This is because these models are usually parametrised by a high number of parameters with complex mathematical relationships between the parameters that reflect the model assumptions. How to leverage these parameters for prediction is currently an open question.

      We here propose the Fisher kernel for predicting individual traits from brain dynamics, using information from generative models that do not assume any knowledge of task timings. We focus on models of brain dynamics that capture within-session changes in functional connectivity and amplitude from fMRI scans, in this case acquired during wakeful rest, and how the parameters from these models can be used to predict behavioural variables or traits. In particular, we use the Hidden Markov Model (HMM), which is a probabilistic generative model of time-varying amplitude and functional connectivity (FC) dynamics (Vidaurre et al., 2017).”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript presents a valuable investigation into the use of Fisher Kernels for extracting representations from temporal models of brain activity, with the aim of improving regression and classification applications. The authors provide solid evidence through extensive benchmarks and simulations that demonstrate the potential of Fisher Kernels to enhance the accuracy and robustness of regression and classification performance in the context of functional magnetic resonance imaging (fMRI) data. This is an important achievement for the neuroimaging community interested in predictive modeling from brain dynamics and, in particular, state-space models.

      Strengths:

      (1) The study's main contribution is the innovative application of Fisher Kernels to temporal brain activity models, which represents a valuable advancement in the field of human cognitive neuroimaging.

      (2) The evidence presented is solid, supported by extensive benchmarks that showcase the method's effectiveness in various scenarios.

      (3) Model inspection and simulations provide important insights into the nature of the signal picked up by the method, highlighting the importance of state rather than transition probabilities.

      (4) The documentation and description of the methods are solid including sufficient mathematical details and availability of source code, ensuring that the study can be replicated and extended by other researchers.

      Weaknesses:

      (1) The generalizability of the findings is currently limited to the young and healthy population represented in the Human Connectome Project (HCP) dataset. The potential of the method for other populations and modalities remains to be investigated.

      As suggested by the reviewer, we have added a limitations paragraph and included a statement about the dataset: Ll. 477-481: “The fMRI dataset we used (HCP 1200 Young Adult) is a large sample taken from a healthy, young population, and it remains to be shown how our findings generalise to other datasets, e.g. other modalities such as EEG/MEG, clinical data, older populations, different data quality, or smaller sample sizes both in terms of the number of participants and the scanning duration”.

      We would like to emphasise that this is a methodological contribution, rather than a basic science investigation about cognition and brain-behaviour associations. Therefore, the method would be equally usable on different populations, even if the results vary.

      (2) The possibility of positivity bias in the HMM, due to the use of a population model before cross-validation, needs to be addressed to confirm the robustness of the results.

      As pointed out by both Reviewers #2 and #3, we did not separate subjects into training and test set before fitting the HMM. To address this issue, we have now repeated the predictions for HMMs fit only to the training subjects. We show that this has no effect on the results. Since this question has consequences for the Fisher kernel, we have also added simulations showing how the different kernels react to increasing heterogeneity between training and test set. These new results are added as results section 2.4 (ll. 376-423).

      (3) The statistical significance testing might be compromised by incorrect assumptions about the independence between cross-validation distributions, which warrants further examination or clearer documentation.

      We have now replaced the significance testing with repeated k-fold cross-validated corrected tests. Note that this required re-running the models to be able to test differences in accuracies on the level of individual folds, resulting in different plots throughout the manuscript and different statistical results. This does not, however, change the main conclusions of our manuscript.

      (4) The inclusion of the R^2 score, sensitive to scale, would provide a more comprehensive understanding of the method's performance, as the Pearson correlation coefficient alone is not standard in machine learning and may not be sufficient (even if it is common practice in applied machine learning studies in human neuroimaging).

      We have now added the coefficient of determination to the results figures.

      (5) The process for hyperparameter tuning is not clearly documented in the methods section, both for kernel methods and the elastic net.

      As mentioned above in the response to Reviewer #1, we have now added details about hyperparameter tuning for the kernel methods and the non-kernelised static FC regression models (see also Reviewer #1 comment 1): Ll.804-813: “We used k-fold nested cross-validation (CV) to select and evaluate the models. We used 10 folds for both the outer loop (used to train and test the model) and the inner loop (used to select the optimal hyperparameters) such that 90% were used for training and 10% for testing. The optimal hyperparameters  (and  in the case of the Gaussian kernels) were selected using grid-search from the vectors λ=[0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1] and . In both the outer and the inner loop, we accounted for family structure in the HCP dataset so that subjects from the same family were never split across folds (Winkler et al., 2015). Within the CV, we regressed out sex and head motion confounds, i.e., we estimated the regression coefficients for the confounds on the training set and applied them to the test set (Snoek et al., 2019).” and ll. 818-820: “We generated the 100 random repetitions of the 10 outer CV folds once, and then used them for training and prediction of all methods, so that all methods were fit to the same partitions.”, as well as ll.913-917: “All time-averaged FC models are fitted using the same (nested) cross-validation strategy as described above (10-fold CV using the outer loop for model evaluation and the inner loop for model selection using grid-search for hyperparameter tuning, accounting for family structure in the dataset, and repeated 100 times with randomised folds).”

      (6) For the time-averaged benchmarks, a comparison with kernel methods using metrics defined on the Riemannian SPD manifold, such as employing the Frobenius norm of the logarithm map within a Gaussian kernel, would strengthen the analysis, cf. Jayasumana (https://arxiv.org/abs/1412.4172) Table 1, log-euclidean metric.

      We have now added the log-Euclidean Gaussian kernel proposed by the reviewer to the model comparisons. The additional model does not change our conclusions.

      (7) A more nuanced and explicit discussion of the limitations, including the reliance on HCP data, lack of clinical focus, and the context of tasks for which performance is expected to be on the low end (e.g. cognitive scores), is crucial for framing the findings within the appropriate context.

      We have now revised the discussion section and added an explicit limitations paragraph: Ll. 475-484:

      “We here aimed to show the potential of the HMM-Fisher kernel approach to leverage information from patterns of brain dynamics to predict individual traits in an example fMRI dataset as well as simulated data. The fMRI dataset we used (HCP 1200 Young Adult) is a large sample taken from a healthy, young population, and it remains to be shown how the exhibited performance generalises to other datasets, e.g. other modalities such as EEG/MEG, clinical data, older populations, different data quality, or smaller sample sizes both in terms of the number of participants and the scanning duration. Additionally, we only tested our approach for the prediction of a specific set of demographic items and cognitive scores; it may be interesting to test the framework in also on clinical variables, such as the presence of a disease or the response to pharmacological treatment.”

      (8) While further benchmarks could enhance the study, the authors should provide a critical appraisal of the current findings and outline directions for future research, considering the scope and budget constraints of the work.

      In addition to the new limitations paragraph (see previous comment), we have now rephrased our interpretation of the results and extended the outlook paragraph: Ll. 485-507:

      “There is growing interest in combining different data types or modalities, such as structural, static, and dynamic measures, to predict phenotypes (Engemann et al., 2020; Schouten et al., 2016). While directly combining the features from each modality can be problematic, modality-specific kernels, such as the Fisher kernel for time-varying amplitude and/or FC, can be easily combined using approaches such as stacking (Breiman, 1996) or Multi Kernel Learning (MKL) (Gönen & Alpaydın, 2011). MKL can improve prediction accuracy of multimodal studies (Vaghari et al., 2022), and stacking has recently been shown to be a useful framework for combining static and time-varying FC predictions (Griffin et al., 2024). A detailed comparison of different multimodal prediction strategies including kernels for time-varying amplitude/FC may may be the focus of future work.

      In a clinical context, while there are nowadays highly accurate biomarkers and prognostics for many diseases, others, such as psychiatric diseases, remain poorly understood, diagnosed, and treated. Here, improving the description of individual variability in brain measures may have potential benefits for a variety of clinical goals, e.g., to diagnose or predict individual patients’ outcomes, find biomarkers, or to deepen our understanding of changes in the brain related to treatment responses like drugs or non-pharmacological therapies (Marquand et al., 2016; Stephan et al., 2017; Wen et al., 2022; Wolfers et al., 2015). However, the focus so far has mostly been on static or structural information, leaving the potentially crucial information from brain dynamics untapped. Our proposed approach provides one avenue of addressing this by leveraging individual patterns of time-varying amplitude and FC, and it can be flexibly modified or extended to include, e.g., information about temporally recurring frequency patterns (Vidaurre et al., 2016).”

      Reviewer #3 (Public Review):

      Summary:

      In this work, the authors use a Hidden Markov Model (HMM) to describe dynamic connectivity and amplitude patterns in fMRI data, and propose to integrate these features with the Fisher Kernel to improve the prediction of individual traits. The approach is tested using a large sample of healthy young adults from the Human Connectome Project. The HMM-Fisher Kernel approach was shown to achieve higher prediction accuracy with lower variance on many individual traits compared to alternate kernels and measures of static connectivity. As an additional finding, the authors demonstrate that parameters of the HMM state matrix may be more informative in predicting behavioral/cognitive variables in this data compared to state-transition probabilities.

      Strengths:

      - Overall, this work helps to address the timely challenge of how to leverage high-dimensional dynamic features to describe brain activity in individuals.

      - The idea to use a Fisher Kernel seems novel and suitable in this context.

      - Detailed comparisons are carried out across the set of individual traits, as well as across models with alternate kernels and features.

      - The paper is well-written and clear, and the analysis is thorough.

      Potential weaknesses:

      - One conclusion of the paper is that the Fisher Kernel "predicts more accurately than other methods" (Section 2.1 heading). I was not certain this conclusion is fully justified by the data presented, as it appears that certain individual traits may be better predicted by other approaches (e.g., as shown in Figure 3) and I found it hard to tell if certain pairwise comparisons were performed -- was the linear Fisher Kernel significantly better than the linear Naive normalized kernel, for example?

      We have revised the abstract and the discussion to state the results more appropriately. For instance, we changed the relevant section in the abstract to (ll. 24-26):

      “We show here, in fMRI data, that the HMM-Fisher kernel approach is accurate and reliable. We compare the Fisher kernel to other prediction methods, both time-varying and time-averaged functional connectivity-based models.”,

      and in the discussion, removing the sentence

      “resulting in better generalisability and interpretability compared to other methods”,

      and adding (given the revised statistical results) ll. 435-436:

      “though most comparisons were not statistically significant given the narrow margin for improvements.”

      In conjunction with the new statistical approach (see Reviewer #2, comment 3), we have now streamlined the comparisons. We explained which comparisons were performed in the methods ll.880-890:

      “For the main results, we separately compare the linear Fisher kernel to the other linear kernels, and the Gaussian Fisher kernel to the other Gaussian kernels, as well as to each other. We also compare the linear Fisher kernel to all time-averaged methods. Finally, to test for the effect of tangent space projection for the time-averaged FC prediction, we also compare the Ridge regression model to the Ridge Regression in Riemannian space. To test for effects of removing sets of features, we use the approach described above to compare the kernels constructed from the full feature sets to their versions where features were removed or reduced. Finally, to test for effects of training the HMM either on all subjects or only on the subjects that were later used as training set, we compare each kernel to the corresponding kernel constructed from HMM parameters, where training and test set were kept separate.“

      Model performance evaluation is done on the level of all predictions (i.e., across target variables, CV folds, and CV iterations) rather than for each of the target variables separately. That means different best-performing methods depending on the target variables are to be expected.

      - While 10-fold cross-validation is used for behavioral prediction, it appears that data from the entire set of subjects is concatenated to produce the initial group-level HMM estimates (which are then customized to individuals). I wonder if this procedure could introduce some shared information between CV training and test sets. This may be a minor issue when comparing the HMM-based models to one another, but it may be more important when comparing with other models such as those based on time-averaged connectivity, which are calculated separately for train/test partitions (if I understood correctly).

      The lack of separation between training and test set before fitting the HMM was also pointed out by Reviewer #2. We are addressing this issue in the new Results section 2.4 (see also our response to Reviewer #2, comment 2).

      Recommendations for the authors:

      The individual public reviews all indicate the merits of the study, however, they also highlight relatively consistent questions or issues that ought to be addressed. Most significantly, the authors ought to provide greater clarity surrounding the use of the cross-validation procedures they employ, and the use of a common atlas derived outside the cross-validation loop. Also, the authors should ensure that the statistical testing procedures they employ accommodate the dependencies induced between folds by the cross-validation procedure and give care to ensuring that the conclusions they make are fully supported by the data and statistical tests they present.

      Reviewer #1 (Recommendations For The Authors):

      Overall, the study is interesting but demands further improvements. Below, I summarize my comments:

      (1) The authors should explain in detail how they applied cross-validation across the dataset for both optimization of parameters, and also for cross-validation of the models to predict individual traits.

      How did you split the dataset for both parameters optimization, and for the CV of the prediction of behavioral traits?

      A review and a summary of various CVs that have been applied on the same dataset should be applied.

      We apologise for the oversight and have now added more details to the CV section of the methods, see our response to Reviewer #1 comment 1:

      In ll. 804-813:

      “We used k-fold nested cross-validation (CV) to select and evaluate the models. We used 10 folds for both the outer loop (used to train and test the model) and the inner loop (used to select the optimal hyperparameters) such that 90% were used for training and 10% for testing. The optimal hyperparameters  (and  in the case of the Gaussian kernels) were selected using grid-search from the vectors λ=[0.0001,0.001,0.01,0.1,0.3,0.5,0.7,0.9,1] and . In both the outer and the inner loop, we accounted for family structure in the HCP dataset so that subjects from the same family were never split across folds (Winkler et al., 2015). Within the CV, we regressed out sex and head motion confounds, i.e., we estimated the regression coefficients for the confounds on the training set and applied them to the test set (Snoek et al., 2019).“ and ll. 818-820: “We generated the 100 random repetitions of the 10 outer CV folds once, and then used them for training and prediction of all methods, so that all methods were fit to the same partitions.”

      (2) The authors should explain in more detail how they applied ICA-based parcellation at the group-level.

      A. Did you apply it across the whole group? If yes, then this is problematic since it rejects the CV approach. It should be applied within the folds.

      B. How did you define the representative time-source per ROI?

      A: How group ICA was applied was stated in the Methods section (4.1 HCP imaging and behavioural data), ll. 543-548:

      “The parcellation was estimated from the data using multi-session spatial ICA on the temporally concatenated data from all subjects.”

      We have now added a disclaimer about the divide between training and test set:

      “Note that this means that there is no strict divide between the subjects used for training and the subjects for testing the later predictive models, so that there is potential for leakage of information between training and test set. However, since this step does not concern the target variable, but only the preprocessing of the predictors, the effect can be expected to be minimal (Rosenblatt et al., 2024).”

      We understand that in order to make sure we avoid data leakage, it would be desirable to estimate and apply group ICA separately for the folds, but the computational load of this would be well beyond the constraints of this particular work, where we have instead used the parcellation provided by the HCP consortium.

      B: This was also stated in 4.1, ll. 554-559: “Timecourses were extracted using dual regression (Beckmann et al., 2009), where group-level components are regressed onto each subject’s fMRI data to obtain subject-specific versions of the parcels and their timecourses. We normalised the timecourses of each subject to ensure that the model of brain dynamics and, crucially, the kernels were not driven by (averaged) amplitude and variance differences between subjects.”

      (3) The authors discussed throughout the paper that their proposed (HMM+Fisher) kernel approach outperformed dynamic functional connectivity (dFC). However, they compared the proposed methodology with just static FC.

      A. The authors didn't explain how static and dFC have been applied.

      B. If the authors wanted to claim that their methodology is better than dFC, then they have to demonstrate results based on dFC with the trivial sliding window approach.

      C. Moreover, the static FC networks have been constructed by concatenating time samples that belong to the same state across the time course of resting-state activity.

      So, it's HMM-informed static FC analysis, which is problematic since it's derived from HMM applied over the brain dynamics.

      I don't agree that connectivity is derived exclusively from the clustering of human brain dynamics!

      D. A static approach of using the whole time course, and a dFC following the trivial sliding-window approach should be adopted and presented for comparison with (HMM+Fisher) kernel.

      We do not intend to claim our manuscript that our method outperforms other methods for doing dynamic FC. Indeed, we would like to be clear that the HMM itself is a method for capturing dynamic FC. Please see our responses to public review comments 2 and 3 by reviewer #1, copied below, which is intended to clear up this misunderstanding:

      We would like to clarify that the HMM is itself a method for estimating dynamic (or time-varying) FC, just like the sliding window approach, see also Vidaurre, 2024 (https://direct.mit.edu/imag/article/doi/10.1162/imag_a_00363/124983) for an overview of terminology.

      We would like to be clear that we do not claim in the manuscript that our method outperforms other dynamic functional connectivity (dFC) approaches, such as sliding window FC. We have now made changes to the manuscript to make this clearer.

      First, we have clarified our use of the term “brain dynamics” to signify “time-varying amplitude and functional connectivity patterns” in this context, as Reviewer #2 raised the point that the former term is ambiguous.

      Second, our focus is on our method being a way of using dFC for predictive modelling, since there currently is no widely accepted way of doing this. One reason why dFC is not usually considered in prediction studies is that it is mathematically not trivial how to use the parameters from estimators of dynamic FC for a prediction. This includes the sliding window approach. We do not aim at comparing across different dFC estimators in this paper. To make these points clearer, we have revised the introduction to now say:

      Ll. 39-50:

      “One reason why brain dynamics are not usually considered in this context pertains to their representation: They are represented using models of varying complexity that are estimated from modalities such as functional MRI or MEG. Although there exists a variety of methods for estimating time-varying or dynamic FC (Lurie et al., 2019), like the commonly used sliding-window approach, there is currently no widely accepted way of using them for prediction problems. This is because these models are usually parametrised by a high number of parameters with complex mathematical relationships between the parameters that reflect the model assumptions. How to leverage these parameters for prediction is currently an open question.

      We here propose the Fisher kernel for predicting individual traits from brain dynamics, using information from generative models that do not assume any knowledge of task timings. We focus on models of brain dynamics that capture within-session changes in functional connectivity and amplitude from fMRI scans, in this case acquired during wakeful rest, and how the parameters from these models can be used to predict behavioural variables or traits. In particular, we use the Hidden Markov Model (HMM), which is a probabilistic generative model of time-varying amplitude and functional connectivity (FC) dynamics (Vidaurre et al., 2017).”

      To the additional points raised here:

      A: How static and dynamic FC have been estimated is explicitly stated in the relevant Methods sections 4.2 (The Hidden Markov Model), which explains the details of using the HMM to estimate dynamic functional connectivity; and 4.5 (Regression models based on time-averaged FC features), which explains how static FC was computed.

      B: We are not making this claim. We have now modified the Introduction to avoid further misunderstandings, as per ll. 33-36: “One way of describing brain dynamics are state-space models, which allow capturing recurring patterns of activity and functional connectivity (FC) across the whole brain.”

      C: This is not how static FC networks were constructed; we apologise for the confusion. We also do not perform any kind of clustering. The only “HMM-informed static FC analysis” is the static FC KL divergence model to allow for a more direct comparison with the time-varying FC KL divergence model, but we have included several other static FC models (log-Euclidean, Ridge regression, Ridge regression Riem., Elastic Net, Elastic Net Riem., and Selected Edges), which do not use HMMs. This is explained in Methods section 4.5.

      D: As explained above, we have included four (five in the revised manuscript) static approaches using the whole time course, and we do not claim that our method outperforms other dynamic FC models. We also disagree that using the sliding window approach for predictive modelling is trivial, as explained in the introduction of the manuscript and under public review comment 3.

      (4) Did you correct for multiple comparisons across the various statistical tests?

      All statistical comparisons have been corrected for multiple comparisons. Please find the relevant text in Methods section 4.4.1.

      (5) Do we expect that behavioral traits are encapsulated in resting-state human brain dynamics, and on which brain areas mostly? Please, elaborate on this.

      While this is certainly an interesting question, our paper is a methodological contribution about how to predict from models of brain dynamics, rather than a basic science study about the relation between resting-state brain dynamics and behaviour. The biological aspects and interpretation of the specific brain-behaviour associations are a secondary point and out of scope for this paper. Our approach uses whole-brain dynamics, which does not require selecting brain areas of interest.

      Reviewer #2 (Recommendations For The Authors):

      Beyond the general principles included in the public review, here are a few additional pointers to minor issues that I would wish to see addressed.

      Introduction:

      - The term "brain dynamics" encompasses a broad spectrum of phenomena, not limited to those captured by state-space models. It includes various measures such as time-averaged connectivity and mean EEG power within specific frequency bands. To ensure clarity and relevance for a diverse readership, it would be beneficial to adopt a more inclusive and balanced approach to the terminology used.

      The reviewer rightly points out the ambiguity of the term “brain dynamics”, which we use in the interest of readability. The HMM is one of several possible descriptions of brain dynamics. We have now included a statement early in the introduction to narrow this down:

      Ll. 32-35:

      “… the patterns in which brain activity unfolds over time, i.e., brain dynamics. One way of describing brain dynamics are state-space models, which allow capturing recurring patterns of activity and functional connectivity (FC) across the whole brain.”

      And ll. 503-507:

      “Our proposed approach provides one avenue of addressing this by leveraging individual patterns of time-varying amplitude and FC, as one of many possible descriptions of brain dynamics, and it can be flexibly modified or extended to include, e.g., information about temporally recurring frequency patterns (Vidaurre et al., 2016).”

      Figures:

      - The font sizes across the figures, particularly in subpanels 2B and 2C, are quite small and may challenge readability. It is advisable to standardize the font sizes throughout all figures to enhance legibility.

      We have slightly increased the overall font sizes, while we are generally following figure recommendations set out by Nature. The font sizes are the same throughout the figures.

      - When presenting performance comparisons, a horizontal layout is often more intuitive for readers, as it aligns with the natural left-to-right reading direction. This is not just a personal preference; it is supported by visualization best practices as outlined in resources like the NVS Cheat Sheet (https://github.com/GraphicsPrinciples/CheatSheet/blob/master/NVSCheatSheet.pdf) and Kieran Healy's book (https://socviz.co/lookatdata.html).

      We have changed all figures to use horizontal layout, hoping that this will ease visual comparison between the different models.

      - In the kernel density estimation (KDE) and violin plot representations, it appears that the data displays may be truncated. It is crucial to indicate where the data distribution ends. Overplotting individual data points could provide additional clarity.

      To avoid confusion about the data distribution in the violin plots, we have now overlaid scatter plots, as suggested by the reviewer. Overlaying the fold-level accuracies was not feasible (since this would result in ~1.5 million transparent points for a single figure), so we instead show the accuracies averaged over folds but separate for target variables and CV iterations. Only the newly added coefficient of determination plots had to be truncated, which we have noted in the figure legend.

      - Figure 3 could inadvertently suggest that time-varying features correspond to panel A and time-averaged features to panel B. To avoid confusion, consider reorganizing the labels at the bottom into two rows for clearer attribution.

      We have changed the layout of the time-varying and time-averaged labels in the new version of the plots to avoid this issue.

      Discussion:

      - The discussion on multimodal modeling might give the impression that it is more effective with multiple kernel learning (MKL) than with other methods. To present a more balanced view, it would be appropriate to rephrase this section. For instance, stacking, examples of which are cited in the same paragraph, has been successfully applied in practice. The text could be adjusted to reflect that Fisher Kernels via MKL adds to the array of viable options for multimodal modeling. As a side thought: additionally, a well-designed comparison between MKL and stacking methods, conducted by experts in each domain, could greatly benefit the field. In certain scenarios, it might even be demonstrated that the two approaches converge, such as when using linear kernels.

      We would like to thank the reviewer for the suggestion about the discussion concerning multimodal modelling. We agree that there are other relevant methods that may lead to interesting future work and have now included stacking and refined the section: ll. 487-494:

      “While directly combining the features from each modality can be problematic, modality-specific kernels, such as the Fisher kernel for time-varying amplitude and/or FC, can be easily combined using approaches such as stacking (Breiman, 1996) or Multi Kernel Learning (MKL) (Gönen & Alpaydın, 2011). MKL can improve prediction accuracy of multimodal studies (Vaghari et al., 2022), and stacking has recently been shown to be a useful framework for combining static and time-varying FC predictions (Griffin et al., 2024). A detailed comparison of different multimodal prediction strategies including kernels for time-varying amplitude/FC may be the focus of future work.”

      - The potential clinical applications of brain dynamics extend beyond diagnosis and individual outcome prediction. They play a significant role in the context of biomarkers, including pharmacodynamics, prognostic assessments, responder analysis, and other uses. The current discussion might be misinterpreted as being specific to hidden Markov model (HMM) approaches. For diagnostic purposes, where clinical assessment or established biomarkers are already available, the need for new models may be less pressing. It would be advantageous to reframe the discussion to emphasize the potential for gaining deeper insights into changes in brain activity that could indicate therapeutic effects or improvements not captured by structural brain measures. However, this forward-looking perspective is not the focus of the current work. A nuanced revision of this section is recommended to better reflect the breadth of applications.

      We appreciate the reviewer’s thoughtful suggestions regarding the discussion of potential clinical applications. We have included the suggestions and refined this section of the discussion: Ll. 495-507:

      “In a clinical context, while there are nowadays highly accurate biomarkers and prognostics for many diseases, others, such as psychiatric diseases, remain poorly understood, diagnosed, and treated. Here, improving the description of individual variability in brain measures may have potential benefits for a variety of clinical goals, e.g., to diagnose or predict individual patients’ outcomes, find biomarkers, or to deepen our understanding of changes in the brain related to treatment responses like drugs or non-pharmacological therapies (Marquand et al., 2016; Stephan et al., 2017; Wen et al., 2022; Wolfers et al., 2015). However, the focus so far has mostly been on static or structural information, leaving the potentially crucial information from brain dynamics untapped. Our proposed approach provides one avenue of addressing this by leveraging individual patterns of time-varying amplitude and FC, and it can be flexibly modified or extended to include, e.g., information about temporally recurring frequency patterns (Vidaurre et al., 2016).”

      Reviewer #3 (Recommendations For The Authors):

      - I wondered if the authors could provide, within the Introduction, an intuitive description for how the Fisher Kernel "preserves the structure of the underlying model of brain dynamics" / "preserves the mathematical structure of the underlying HMM"? Providing more background may help to motivate this study to a general audience.

      We agree that this would be helpful and have now added this to the introduction: Ll.61-67:

      “Mathematically, the HMM parameters lie on a Riemannian manifold (the structure). This defines, for instance, the relation between parameters, such as: how changing one parameter, like the probabilities of transitioning from one state to another, would affect the fitting of other parameters, like the states’ FC. It also defines the relative importance of each parameter; for example, how a change of 0.1 in the transition probabilities would not be the same as a change of 0.1 in one edge of the states’ FC matrices.”

      To communicate the intuition behind the concept, the idea was also illustrated in Figure 1, panel 4 by showing Euclidean distances as straight lines through a curved surface (4a, Naïve kernel), as opposed to the tangent space projection onto the curved manifold (4b, Fisher kernel).

      - Some clarifications regarding Figure 2a would be helpful. Was the linear Fisher Kernel significantly better than the linear Naive normalized kernel? I couldn't find whether this comparison was carried out. Apologies if I have missed it in the text. For some of the brackets indicating pairwise tests and their significance values, the start/endpoints of the bracket fall between two violins; in this case, were the results of the linear and Gaussian Fisher Kernels pooled together for this comparison?

      We have now streamlined the statistical comparisons and avoided plotting brackets falling between two violin plots. The comparisons that were carried out are stated in the methods section 4.4.1. Please see also our response to above to Reviewer #3 public review, potential weaknesses, point 1, relevant point copied below:

      In conjunction with the new statistical approach (see Reviewer #2, comment 3), we have now streamlined the comparisons. We explained which comparisons were performed in the methods ll.880-890:

      “For the main results, we separately compare the linear Fisher kernel to the other linear kernels, and the Gaussian Fisher kernel to the other Gaussian kernels, as well as to each other. We also compare the linear Fisher kernel to all time-averaged methods. Finally, to test for the effect of tangent space projection for the time-averaged FC prediction, we also compare the Ridge regression model to the Ridge Regression in Riemannian space. To test for effects of removing sets of features, we use the approach described above to compare the kernels constructed from the full feature sets to their versions where features were removed or reduced. Finally, to test for effects of training the HMM either on all subjects or only on the subjects that were later used as training set, we compare each kernel to the corresponding kernel constructed from HMM parameters, where training and test set were kept separate”.

      - The authors may wish to include, in the Discussion, some remarks on the use of all subjects in fitting the group-level HMM and the implications for the cross-validation performance, and/or try some analysis to ensure that the effect is minor.

      As suggested by reviewers #2 and #3, we have now performed the suggested analysis and show that fitting the group-level HMM to all subjects compared to only to the training subjects has no effect on the results. Please see our response to Reviewer #2, public review, comment 2.

      - The decision to use k=6 states was made here, and I wondered if the authors may include some support for this choice (e.g., based on findings from prior studies)?

      We have now refined and extended our explanation and rationale behind the number of states: Ll. 586-594: “The number of states can be understood as the level of detail or granularity with which we describe the spatiotemporal patterns in the data, akin to a dimensionality reduction, where a small number of states will lead to a very general, coarse description and a large number of states will lead to a very detailed, fine-grained description. Here, we chose a small number of states, K=6, to ensure that the group-level HMM states are general enough to be found in all subjects, since a larger number of states increases the chances of certain states being present only in a subset of subjects. The exact number of states is less relevant in this context, since the same HMM estimation is used for all kernels.”

      - (minor) Abstract: "structural aspects" - do you mean structural connectivity?

      With “structural aspects”, we refer to the various measures of brain structure that are used in predictive modelling. We have now specified: Ll. 14-15: “structural aspects, such as structural connectivity or cortical thickness”.

    1. eLife Assessment

      This important modeling study alters a previous model of the intact cat spinal locomotor network to simulate a lateral hemi-section of the spinal cord. The modeling and experimental work described provide convincing evidence that this model is capable of qualitatively predicting alterations to the swing and stance phase durations during locomotion at different speeds on intact or split-belt treadmills. This paper will interest neuroscientists studying vertebrate motor systems, including researchers working on motor dysfunction after spinal cord injury.

    2. Reviewer #1 (Public review):

      Summary:

      This study adapts a previously published model of the cat spinal locomotor network to make predictions of how phase durations of swing and stance at different treadmill speeds in tied-belt and split-belt conditions would be altered following a lateral hemisection. The simulations make several predictions that are replicated in experimental settings. This updated manuscript addressed well many of the reviewer comments made to the first version.

      Strengths:

      -Despite only altering the connections in the model, the model is able to replicate very well several experimental findings. This provides strong validation for the model and highlights its utility as a tool to investigate the operations of mammalian spinal locomotor networks.

      -The study provides insights about interactions between the left and right side of the spinal locomotor networks, and how these interactions depend on the mode of operation, as determined by speed and state of the nervous system.

      -The writing is logical, clear and easy to follow.

      Comments on revisions:

      My concerns were well addressed by the authors. I have no additional concerns

    3. Reviewer #2 (Public review):

      This is a nice article that presents interesting findings. The model's predictions match the data, which is good. The discussion points to modeling plasticity after SCI, which will be important.

      The manuscript is well-written and interesting, and the putative neural circuit mechanisms that the model uncovers are super cool if they can be tested in an animal.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The modeling and experimental work described provide solid evidence that this model is capable of qualitatively predicting alterations to the swing and stance phase durations during locomotion at different speeds on intact or split-belt treadmills, but a revision of the figures to overlay the model predictions with the experimental data would facilitate the assessment of this qualitative agreement. This paper will interest neuroscientists studying vertebrate motor systems, including researchers investigating motor dysfunction after spinal cord injury.

      Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements for Figures 5-7. This highlights how accurate the model predictions were.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for the positive evaluation of our paper and emphasizing its strengths in the Summary.

      Weaknesses:

      (1) Could the authors provide a statement in the methods or results to clarify whether there were any changes in synaptic weight or other model parameters of the intact model to ensure locomotor activity in the hemisected model?

      Such a statement has been inserted in Materials and Methods, section “Modeling”. Also, in the 1st paragraph of section “Spinal sensorimotor network architecture and operation after a lateral spinal hemisection”, we stated that no “additional changes or adjustments” were made.

      (2) The authors should remind the reader what the main differences are between state-machine, flexor-driven, and classical half-center regimes (lines 77-79).

      Short explanations/reminders have been inserted (see lines 80-83 of tracked changes document).

      (3) There may be changes in the wiring of spinal locomotor networks after the hemisection. Yet, without applying any sort of plasticity, the model is able to replicate many of the experimental data. Based on what was experimentally replicated or not, what does the model tell us about possible sites of plasticity after hemisection?

      Quantitative correspondence of changes in locomotor characteristics predicted by the model and those obtained experimentally provide additional validation of the model proposed in the preceding paper and used in this paper. This was our ultimate goal. None of the plastic changes during recovery were modeled because of a lack of precise information on these changes. The absence of possible plastic changes may explain the small discrepancies between our simulations and experimental data (see Supplemental Figures that have been added). However, the model only has a simplified description of spinal circuits without motoneurons and without real simulation of leg biomechanics. This limits our analysis or predictions of possible plastic changes within a reasonable degree of speculation. This issue is discussed in section: “Limitations and future directions” in the Discussion. We have also inserted a sentence: “The lack of possible plastic changes in spinal sensorimotor circuits of our model may explain the absence of exact/quantitative correspondences between simulated and experimental data.

      (4) Why are the durations on the right hemisected (fast) side similar to results in the full spinal transected model (Rybak et al. 2024)? Is it because the left is in slow mode and so there is not much drive from the left side to the right side even though the latter is still receiving supraspinal drive, as opposed to in the full transection model? (lines 202-203).

      This is correct. We have included this explanation in the text (lines 210-211 of tracked changes document).

      (5) There is an error with probability (line 280).

      This typo was corrected.

      Reviewer #2 (Public review):

      This is a nice article that presents interesting findings. One main concern is that I don't think the predictions from the simulation are overlaid on the animal data at any point - I understand the match is qualitative, which is fine, but even that is hard to judge without at least one figure overlaying some of the data.

      We thank the Reviewer for the constructive comments. Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements for Figures 5-7. This highlights how accurate the model predictions were.

      Second is that it's not clear how the lateral coupling strengths of the model were trained/set, so it's hard to judge how important this hemi-split-belt paradigm is. The model's predictions match the data qualitatively, which is good; but does the comparison using the hemi-split-belt paradigm not offer any corrections to the model? The discussion points to modeling plasticity after SCI, which could be good, but does that mean the fit here is so good there's no point using the data to refine?

      The model has not been trained or retrained, but was used as it was described in the preceding paper. Response: Quantitative correspondence of changes in locomotor characteristics predicted by the model and those obtained experimentally provide additional validation of the model proposed in the preceding paper and used in this paper. This was our ultimate goal. None of the plastic changes during recovery were modeled because of a lack of precise information on these changes. The absence of possible plastic changes may explain the small discrepancies between our simulations and experimental data (see figure supplements that have been added). However, the model only has a simplified description of spinal circuits without motoneurons and without real simulation of leg biomechanics. This limits our analysis or predictions of possible plastic changes within a reasonable degree of speculation. This issue is discussed in section: “Limitations and future directions” in the Discussion.

      The manuscript is well-written and interesting. The putative neural circuit mechanisms that the model uncovers are great, if they can be tested in an animal somehow.

      We agree and we are considering how we can do this in an animal model.

      Page 2, lines 75-6: Perhaps it belongs in the other paper on the model, but it's surprising that in the section on how the model has been revised to have different regimes of operation as speed increases, there is no reference to a lot of past literature on this idea. Just one example would be Koditschek and Full, 1999 JEB Figure 3, where they talk about exactly this idea, or similarly Holmes et al., 2006 SIAM review Figure 7, but obviously many more have put this forward over the years (Daley and Beiwener, etc). It's neat in this model to have it tied down to a detailed neural model that can be compared with the vast cat literature, but the concept of this has been talked about for at least 25+ years. Maybe a review that discusses it should be cited?

      We have revised the Introduction to include the suggested references.

      Page 2, line 88: While it makes sense to think of the sides as supraspinal vs afferent driven, respectively, what is the added insight from having them coupled laterally in this hemisection model? What does that buy you beyond complete transection (both sides no supra) compared with intact?

      We are trying to make one model that could reproduce multiple experimental data in quadrupedal locomotion, including genetic manipulations with (silencing/removal) particular neuron types (and commissural interneurons), as pointed out in the section “Model Description” in the Results. These lateral connections are critical for reproducing and explaining other locomotor behaviors demonstrated experimentally. However, even in this study, these lateral interactions are necessary to maintain left-right coordination and equal left-right frequency (step period) during split-belt locomotion and after hemisection.

      I can see how being able to vary cycle frequencies separately of the two limbs is a good "knob" to vary when perturbing the system in order to refine the model. But there isn't a ton of context explaining how the hemi-section with split belt paradigm is important for refining the model, and therefore the science. Is it somehow importantly related to the new "regimes" of operation versus speed idea for the model?  

      We did not refine the model in this paper. We just used it for new simulations. The predictions strengthen the organization and operation of the model we recently proposed.

      Page 5, line 212: For the predictions from the model, a lot depends on how strong the lateral coupling of the model is, which, in turn, depends on the data the model was trained on. Were the model parameters (especially for lateral coupling of the limbs) trained on data in a context where limbs were pushed out of phase and neuronal connectivity was likely required to bring the limbs back into the same phase relationship? Because if the model had no need for lateral coupling, then it's not so surprising that the hemisected limbs behave like separate limbs, one with surpaspinal intact and one without.

      Please see our response above concerning the need for lateral interactions incorporated to the model.

      Page 8, line 360: The discussion of the mechanisms (increased influence of afferents, etc) that the model reveals could be causing the changes is exciting, though I'm not sure if there is an animal model where it can be tested in vivo in a moving animal.

      We agree it may be difficult to test right now but we are considering experimental approaches.

      Page 9, line 395: There are some interesting conclusions that rely on the hemi-split-belt paradigm here.

      We agree with this comment. Thanks.

      Reviewer #2 (Recommendations for the authors):

      Figures: Why aren't there any figures with the simulation results overlaid on the animal data?

      We followed this suggestion. Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements.

    1. eLife Assessment

      This valuable study revealed numerous distinct lineages that evolved within a local human population in Alberta, Canada, leading to persistent cases of E. coli O157:H7 infections for over a decade and highlighting the ongoing involvement of local cattle in disease transmission, as well as the possibility of intermediate hosts and environmental reservoirs. This study also showed a shift towards more virulent stx2a-only strains becoming predominant in the local lineages. The evidence supporting the role played by cattle in the transmission system of human cases of E. coli O157:H7 in Alberta is solid.

    2. Reviewer #1 (Public review):

      Summary:

      This is a high-quality, well-thought through analysis of STEC transmission in Alberta, Canada.

      Strengths:

      * The combined human and animal sampling is a great foundation for this kind of study.<br /> * Phylogenetic analyses seem to have been carried out in a high quality fashion.

      Comments on the revised version:

      I'd like to thank the authors for the diligence with which they addressed my comments. I agree with their points and am happy for the manuscript to proceed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      We appreciate the reviewer’s careful thoughts about our sampling strategy. We agree with points (1) and (2), and we have provided additional details on the animal collections as requested (lines 95-101).

      We agree with point (3) in theory but not in fact. As shown in Figure 3, the cattle isolates were very closely related, despite the temporal and geographic breadth of sampling within Alberta. The median SNP distance between cattle sequences was 45 (IQR 36-56), compared to 54 (IQR 43-229) SNPs between human sequences from cases in Alberta during the same years. Additionally, as shown in Figure 2, only clade A and B isolates – clades that diverge substantially from the rest of the tree – were dominated by human cases in Alberta. We have better highlight this evidence in the revision (lines 234-236 and 247-249).

      We agree with the reviewer in point (4) that outbreaks can be an important confounder of phylogenetic inference. This is why we down-sampled outbreaks (based on genetic relatedness, not external designation) in our extended analyses. We did not do this in the primary analysis, because there were no large clusters of identical isolates. Figure 3b shows a limited number of small clusters; however, clustered cattle isolates outnumbered clustered human isolates, suggesting that any bias would be in the opposite direction the reviewer suggests. In the revision, we down-sampled all analyses and, indeed, the proportion of human lineages descending from cattle lineages increased (lines 259-261). Regarding severe cases being oversampled among the clinical isolates, this is absolutely true and a limitation of all studies utilizing public health reporting data. We made this limitation to generalizability clearer in the discussion. However, as noted above, clinical isolates were more variable than cattle isolates, so it does not appear to have heavily biased the analysis (lines 490-495).

      We disagree with the reviewer on point (5). While the bias toward severe cases could make the human isolates less independent, the relative sampling proportions are likely to induce greater distance between clinical isolates than cattle isolates, which is exactly what we observe (see response to point (3) above). Cattle are E. coli O157:H7’s primary reservoir, and humans are incidental hosts not able to sustain infection chains long-term. Not only is the bacteria prevalent among cattle, cattle are also highly prevalent in Alberta. Thus, even with 89 sampling points, we are still capturing a small proportion of the E. coli O157:H7 in the province. Being able to sample only a small proportion of cattle’s E. coli O157:H7 increases the likelihood of only sampling from the center of the distribution, making extreme cases such as that shown at the very bottom of the tree in Figure 4, rare and important. In comparison, sampling from human cases constitutes a higher proportion of human infections relative to cattle, and is therefore more representative of the underlying distribution, including extremes. We added this point to the limitations (lines 495-504). As with the clustering above, if anything, this outcome would have biased the study away from identifying cattle as the primary reservoir. Additionally, the relatively small proportion of cattle sampled makes our finding that 15.7% of clinical isolates were within 5 SNPs of a cattle isolate, the distance most commonly used to indicate transmission for E. coli O157:H7, all the more remarkable.

      Because of the aforementioned points, we disagree with the reviewer’s conclusion in point (6). If a bias exists, we believe transmission from cattle-to-humans is likely underestimated for the reasons given above. Not only do all prior studies indicate ruminants as the primary reservoirs of E. coli O157:H7, and humans as only incidental hosts, our specific data do not support the reviewer’s individual contentions. The results of the sensitivity analysis the reviewer recommended is consistent with the points we outlined above, estimating that 94.3% of human lineages arose from cattle lineages (vs. 88.5% in the primary analysis). We have opted to retain the more conservative estimate of the primary analysis, which includes a more representative number of clinical cases.

      (7) We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

      The reviewer is correct, and the suggestion for the direction of future studies was our intent with this statement. We have removed this sentence.

      Reviewer #1 (Recommendations For The Authors):

      (1) To address my concerns about the different sampling frames in humans and animals, I would suggest a sensitivity analysis, using something like the following strategy. Make a phylogeny of all the available genome sequences from humans and cattle from Alberta. Phylogenetically sub-sample the tree, using something like Treemer (https://github.com/fmenardo/Treemmer), to remove phylogenetically redundant isolates from the same host type. Randomly select 100 human and 100 animal isolates from this non-redundant tree, and re-do your analysis.

      Although we originally down-sampled outbreaks for our analysis of the extended Alberta tree (2007-2019), we had not done this systematically for all analyses. We were not able to use the recommended Treemer tool, because we did not see a way to incorporate the timing of sequences. Because the objective of our study was to evaluate persistence, we did not want to exclude identical sequences that were separated in time and thus could be indicating persistence. To accomplish this, we developed a utility that allowed us to incorporate the temporality of sequences. Using this utility, we systematically down-sampled all sequences that met the following conditions: 1) within 0-2 SNPs of another sequence and 2) no gaps in sequence set >2 months. The second condition means that for any set of sequences within 0-2 SNPs of one another, there can be no more than 2 months without a sequence from the set. Similar sequences that occur beyond this 2-month-cutoff would be considered a separate set for down-sampling. This cutoff was chosen based on the epidemiology of E. coli O157 outbreaks, which are generally either point-source or continuous-source outbreaks. Intermittent outbreaks of a single strain are believed to arise from distinct contamination events and are exactly the type of phenomena we are seeking to identify. We have added details on down-sampling to the Methods (lines 178-180).

      After down-sampling, our primary analysis included 115 human and 84 cattle isolates. T conduct the recommended sensitivity analysis, we further randomly subsampled the human isolates, selecting 84 to match the number of cattle isolates. As we suggested in our initial response, and contrary to the reviewer’s concern, subsampling in this way accentuated the results, with 94.3% of human lineages inferred as arising from cattle lineages, compared to 88.5% in the primary analysis. This sensitivity analysis also identified 10 of the 11 LPLs identified in the primary analysis. The LPL not identified had 5 isolates in the primary analysis, the minimum for definition as an LPL, and was reduced to 4 isolates through subsampling. This sensitivity analysis is shown in Suppl. Figure S3.

      (2) This is the first time I've seen target diagrams used for SNP distances, I'm not sure of their value compared with histograms. They seem to emphasise the maximum distance, rather than the largest number of isolates. I.e. most isolates are closely related, but the diagram emphasises the small number of divergent ones.

      In using the target diagrams, we sought to emphasize the bimodal distribution of human-to-closest-cattle SNP differences. However, this is still mostly visible in a histogram, so we have replaced the target diagrams with a histogram as suggested (Figure 3).

      (3) L130 - fastqc doesn't trim adapters and read ends, there will be something else like trimmomatic which does.

      The reviewer is correct, and we appreciate them catching this error. Trimmomatic is incorporated into the Shovill pipeline, which was the assembler we used through the Bactopia pipeline. We have updated the Methods to indicate this (lines 142-144).

      (4) I find the flow of the article a bit confusing. You have your primary analysis, but Figure 2, which is a secondary analysis, comes before Figure 3. Which is the primary analysis? For me, primary analysis results should come first, or at least signpost a bit better.

      Figure 2 is not a secondary analysis. It is intended to provide an overview of the isolates used from the phylogenetic perspective, just as the diagram in Figure 1 provides an overview of the isolates by analysis. The secondary analyses are shown in Figures 5-7. We have added a sub-header, “Description of Isolates”, to the section referring to Figure 2, to clarify (line 232).

      (5) Locally persistent lineage definition. What is the rationale for the different criteria signifying locally persistent lineages? There is nothing in some of your criteria e.g. all isolates <30 SNPs from each other, which indicates that it is locally persistent - could have been transmitted to Japan (just to pick a place at random), causing a bunch of cases there, and then come back for all we know. Would that be a locally persistent lineage? Did you use the MCC tree here? That is a sub-sample of your full dataset, I am not sure what exactly you're trying to say with the LPLs, but maybe using a larger dataset would be better? Also, there are lots of STEC genomes available from e.g. UK and USA, by only including a fraction of these, you limit the strength of the inferences you can make about locally persistent lineages unless you know that they don't see the G sub-lineage that you observe.

      The reviewer raises multiple points here. First, regarding our definition of LPLs, it is intended to identify those lineages that pose a threat to populations in the specific geographic area (“local”) for at least 1 year (“persistent”) that are likely to be harbored in local reservoirs. Each of the criteria contributes to this definition.

      (1) A single lineage of the MCC tree with a most recent common ancestor (MRCA) with ≥95% posterior probability: This criterion provides confidence in the given isolates being part of a single, defined lineage. The posterior probability gives the probability that the topology of the tree is accurate, based on the data provided and the chosen model of evolution. In other words, we required at least 95% probability that the lineage was correct, and in practice the posterior probability of the lineages we defined as LPLs was 99.7-100% (we have added this detail to the text, lines 269-270). We also added a sensitivity analysis, shown in Suppl. Figure S4, which shows all sampled trees. We find that the essential structure of the tree around the LPLs we defined is well-supported.

      (2) All isolates ≤30 core SNPs from one another: This criterion limited LPLs to those lineages where the isolates were closely related. We did not want to limit LPLs to those that might define an outbreak, for example using a 5-10 SNP threshold, because the point of the study is to identify lineages that persistently cause disease over longer periods than a normal outbreak. Pathogens evolve over time in their reservoirs, leading to greater SNP distances, and we wanted to allow for this. The U.S. CDC has acknowledged a similar concern for such persistent lineages in its definition of REP strains, which it has defined based on ranges of 13-104 allele differences by cgMLST. Thus, our choice of 30 core SNPs as the threshold is in line with current practice in the emerging science on persistence of enteric pathogens. We have also added a sensitivity analysis examining alternate SNP thresholds, shown in Suppl. Figure S5, which results in clusters of LPLs identified in the primary analysis being grouped into larger lineages. Additionally, in the tree showing our primary analysis (Figure 4), we now note the minimum number of SNPs all isolates within the lineage differ by.

      (3) Contained at least 1 cattle isolate: This criterion increases confidence that the lineage is indeed “local”. Unlike humans, cattle are not known to be routinely infected by imported food products, and they do not make roundtrip journeys to other locations, as humans infected during travel do. Cattle themselves may be imported into Alberta while infected, and cattle in Alberta can be infected by other imported animals. In these cases, if the STEC strains the cattle harbor persist for ≥1 year, they become the type of lineages we are interested in as LPLs, regardless where they previously came from, because they are now potential persistent sources of infection in Alberta. By including at least one cattle isolate in each LPL, the only way an identified LPL is not actually local is if cattle are imported from the lineage’s reservoir community elsewhere (e.g., in Japan, as the reviewer suggested), the lineage is persisting in that non-Alberta reservoir, and newly infected cattle are imported repeatedly over 1 or more years. This could feasibly explain G(vi)-AB LPL 5 (Figure 4), which is entirely composed of cattle. Indeed, such an explanation would be consistent with the lack of new cases from this LPL after 2015 in the extended analysis (Figure 5). However, for all other LPLs, which contain both cattle and human isolates, for the LPL to not be local, both cattle and human cases would have to be imported from the same non-Alberta reservoir. While this is possible, the probability of such a scenario is low, and it decreases the more isolates are in an LPL. For the average LPL, this means 4 human and 6 cattle cases would need to be imported from a non-Alberta reservoir over several years. Given that our study is only a random sample of the total STEC cases and cattle in Alberta from 2007-2015, these numbers are underestimates of the true absolute number of cases and cattle associated with LPLs that would have to be explained by importation if the LPL were not local. We have added some explanation of the possibility of importation in the Discussion where we discuss the LPL criteria (lines 376-380).

      (4) Contained ≥5 isolates: In concert with criterion 3, this criterion guards against anomalies being counted as LPLs. By requiring at least 5 isolates in an LPL after down-sampling, at least 5 infection events must have occurred from the LPL, reducing the likelihood of importation explaining the LPL and emphasizing more significant LPLs.

      (5) The isolates were collected at sampling events (for cattle) or reported (for humans) over a period of at least 1 year: This criterion defines the persistence aspect of the LPL. In the primary analysis, the LPLs we identified persisted for an average of 8 years, with the shortest persisting for 5 years (these details have been added to the text, lines 268-269). Incorporating the extended analysis, several LPLs persisted for the full 13 years of the study.

      Regarding using additional non-Alberta isolates to help rule out importation, we have expanded the number of U.S. and global isolates included in the importation analysis, over-sampling clade G isolates from the U.S. (Figure 7). As cattle trade is substantially more common with the U.S. than other countries, we felt it most important to focus on the U.S. as a potential source of both imported cattle and human cases. Our results from this analysis show that only 9 of 494 (1.8%) U.S. isolates occurred in the LPLs we defined in the primary analysis, and all occurred after Alberta isolates (lines 313-317). Although we also added more global isolates, we still found that none were associated with the Alberta LPLs.

      (6) Given the importance of sampling for a study like this, some more information on animal sampling studies should be included here.

      We have added details on the cattle sampling to the Methods (lines 95-101).

      (7) L172 - do you mean an MRCA with >- 95% probability of location in Alberta?

      Location in Alberta was not determined from the primary analysis, which defined the LPLs, as only Alberta isolates were included in that analysis. As described above, this criterion meant that we required at least 95% probability that the tree topology at the lineage’s MRCA was correct, and in practice the posterior probability of the lineages we defined as LPLs was 99.7-100%.

      (8) Need a supplementary figure of just clade G from Figure 2.

      We have added a sub-tree diagram of clade G(vi) as Figure 2b.

      Reviewer #2 (Public Review):

      This study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Furthermore, this study mentions a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. The authors hypothesized that this phenomenon is the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence. These opinions more effectively explain the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.

      (1) The authors acknowledge the possibility of intermediate hosts or environmental reservoirs playing a role in transmission. Further discussion on the potential roles of other animal species commonly found in Alberta (e.g., sheep, goats, swine) could enhance the understanding of the transmission dynamics. Were isolates from these species available for analysis? If not, the authors should clearly state this limitation.”

      We have expanded the discussion of other species in Alberta, as suggested, including other livestock, wildlife, and the potential role of birds and flies (lines 353-360). Unfortunately, we did not have sequences available from other species, which we have added to the limitations (lines 487-490).

      (2) The focus on E. coli O157:H7 is understandable given its prominence in Alberta and the availability of historical data. However, a brief discussion on the potential applicability of the findings to non-O157 STEC serogroups, and the limitations therein, would be beneficial. Are there reasons to believe the transmission dynamics would be similar or different for other serogroups?

      We appreciate this comment and have expanded our discussion of relevance to non-O157 STEC (lines 452-460). Other authors have proposed that transmission dynamics differ, and studies of STEC risk factors, including our own, support this. However, there has been very little direct study of non-O157 transmission dynamics and there is even less cross-species genomic and metadata available for non-O157 isolates of concern.

      (3) The authors briefly mention the need for elucidating local transmission systems to inform management strategies. A more detailed discussion on specific public health interventions that could be targeted at the identified LPLs and their potential reservoirs would strengthen the paper's impact.

      We agree with the reviewer that this would be a good addition to the manuscript. The public health implications for control are several and extend to non-STEC reportable zoonotic enteric infections, such as Campylobacter and Salmonella. We have added a discussion of these (lines 460-465, 467-485).

      (4) Understanding the relationship between specific risk factors and E. coli O157:H7 infections is essential for developing effective prevention strategies. Have case-control or cohort studies been conducted to assess the correlation between identified risk factors and the incidence of E. coli O157:H7 infections? What methodologies were employed to control for potential confounders in these studies?

      Yes, there have been several case-control studies of reported cases. Many of these are referenced in the discussion in terms of the contribution of different sources to infection. As risk factors were not the focus of the current study, we believe a thorough discussion of the literature on the aspects of these various studies is beyond our scope. However, we have added some details on the risk factors themselves (lines 72-79).

      (5) The study's findings are noteworthy, particularly in the context of E. coli O157:H7 epidemiology. However, the extent to which these results can be replicated across different temporal and geographical settings remains an open question. It would be constructive for the authors to provide additional data that demonstrate the replication of their sampling and sequencing experiments under varied conditions. This would address concerns regarding the specificity of the observed patterns to the initial study's parameters.

      We appreciate the reviewer’s comment, as we are currently building on this analysis with an American dataset with different types of data available than were used in this study. Aligned with this work, we have added a comment on the adaptation of our method to other settings with different types of data (lines 448-450). We also added a sensitivity analysis to the manuscript simulating a different sampling approach (Suppl. Fig. S3), which should be informative to this question.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments.

      (1) Figure 1: The figure is a critical visual representation of the study's findings and should be given prominent emphasis. It is essential that the key discoveries of the research are clearly depicted and explained in this visual format. The authors should ensure that Figure 1 is detailed and informative enough to stand out as a central piece of the study.

      Figure 1 is the diagram of sample numbers, locations, and corresponding analyses. We assume that the reviewer means to refer to Figure 2. Although the inclusion of >1,200 isolates makes the tree difficult to see in detail, we have made some modifications to make the findings clearer. First, we changed the clade coloration such that the only subclade differentiated is G(vi). We have removed the stx metadata ring to focus attention on the location and species of the isolates, as stx data are described in Table 1. Finally, we have added a sub-tree diagram of clade G(vi), colored by location. This makes clear the large sections of the subclade dominated by isolates from one location or another, and the limited areas where they overlap.

      (2) Figures 2 and 4: While these figures contribute to the presentation of the data, they appear to be somewhat rudimentary in their current form. The lack of detailed annotations regarding the clustering of different strains is a notable omission. I recommend that the authors refine these figures to include comprehensive labeling that clearly delineates the various bacterial clusters. Enhanced graphical representation with clear annotations will aid readers in better understanding the study's findings.

      We appreciate this suggestion. We have remade all trees generated by the BEAST 2 analyses in R, rather than FigTree. This has allowed us to annotate the trees with additional information on the LPLs and we believe provides a clearer picture of each LPL.

      (3) Supplemental Table S1: The supplemental tables are an excellent opportunity to showcase additional data and findings that support the study's conclusions. For Supplemental Table S1, it is recommended that the authors highlight the innovative aspects or novel discoveries presented in this table.

      Suppl. Table S1 shows the modeling specifications and priors used in the analyses. These decisions were not in and of themselves novel. The innovation in our methods is due to the development of the LPLs based on the trees resulting from the analyses detailed in Suppl. Table S1, as well as from the application of these models to E. coli O157:H7 for the first time. However, we understand the reviewers point and have emphasized the importance of the results shown in Suppl. Table S2 (lines 391-395).

      (4) Line 35: "We assessed the role of persistent cross-species transmission systems in Alberta's E. coli O157:H7 epidemiology." change to "We assessed the impact of persistent cross-species transmission systems on the epidemiology of E. coli O157:H7 in Alberta."

      We have made this change.

      (5) To facilitate a deeper understanding of the core findings of the manuscript and to enable the development of effective response strategies, I suggest that the authors provide more information regarding the sequencing data used in the study. This information should at least include aspects such as data accessibility and quality control measures.

      We have included a Supplemental Data File that lists all isolates used in the analysis, and the QC measures are detailed in the Methods.

    1. eLife Assessment

      This work models reinforcement-learning experiments using a recurrent neural network. It examines if the detailed credit assignment necessary for back-propagation through time can be replaced with random feedback. In this useful study the authors show that it yields a satisfactory approximation but the evidence to support that it holds in general is incomplete. As only short temporal delays are used and the examples simulated are overly simple, the approximation would need to be tested on more complex task and with larger networks.

    2. Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.<br /> • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.<br /> • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      • The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained. These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      • The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      • In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      • Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

    3. Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

    4. Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (post-synaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task?<br /> (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:<br /> (7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.<br /> (7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.<br /> (7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      (1) https://www.nature.com/articles/s41467-020-17236-y

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      • The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained. These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      Thank you for pointing out the lack of explanation about untrained RNN. Untrained RNN in our simulations (except Fig. 6D/6E-gray-dotted) was randomly initialized RNN (i.e., connection weights were drawn from a pseudo normal distribution) that was used as initial RNN for training of value-RNNs. As you suggested, the performance of untrained RNN indeed improved as the number of units increased (Fig. 2J), and its highest part was almost comparable to the highest performance of trained value-RNNs (Fig. 2I). In the revision we will show the dimensionality of network dynamics (as you have suggested), and eigenvalue spectrum of the network.

      • The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      Thank you for pointing out this important issue, for which our explanation was lacking and our examination was insufficient. We do not consider that single time step in our models corresponds to the neuronal membrane time constant. Rather, for the following reasons, we assume that the time step corresponds to several hundreds of milliseconds:

      - We assume that single RNN unit corresponds to a small neuron population that intrinsically (for genetic/developmental reasons) share inputs/outputs and are mutually connected via excitatory collaterals.

      - Cortical activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics (Mongillo et al., 2008, Science "Synaptic Theory of Working Memory" https://www.science.org/doi/10.1126/science.1150769).

      - In line with such theoretical suggestion, previous research examining excitatory interactions between pyramidal cells, to which one of us (the corresponding author Morita) contributed by conducting model fitting (Morishima, Morita, Kubota, Kawaguchi, 2011, J Neurosci, https://www.jneurosci.org/content/31/28/10380), showed that mean recovery time constant from facilitation for recurrent excitation among one of the two types of cortico-striatal pyramidal cells was around 500 milliseconds.

      If single time step corresponds to 500 milliseconds, three time steps from cue to reward in our simulations correspond to 1.5 sec, which matches the delay in the conditioning task used in Schultz et al. 1997 Science. Nevertheless, as you pointed out, it is necessary to examine whether our random feedback models can work for longer delays, and we will examine it in our revision.

      • In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      Our explanation and examination about this issue were insufficient, and thank you for pointing it out and giving us helpful suggestion. In the Discussion (Line 507-510) of the original manuscript, we described "Regarding the connectivity, in our models, recurrent/feed-forward connections could take both positive and negative values. This could be justified because there are both excitatory and inhibitory connections in the cortex and the net connection sign between two units can be positive or negative depending on whether excitation or inhibition exceeds the other." However, we admit that the meaning of this description was not clear, and more explicit modeling will be necessary as you suggested.

      Therefore in our revision, we will examine models, in which inhibitory units (modeling fast-spiking (FS) GABAergic cells) will be incorporated, and neuron will follow Dale’s law.

      • Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      Thank you very much for this insightful comment. In our revision, we will examine situations where there exist not only reward-associated cue but also randomly appeared distractor cues.

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      In our revision, we will examine more biologically realistic models with excitatory and inhibitory units, as well as more complicated tasks with distractor cues. We will also consider whether/how the depth of networks can be increased, though we do not currently have concrete idea on this last point. Thank you also for giving us the detailed insightful 'recommendations for authors'. We will address also them in our revision.

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (post-synaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      Thank you for this insightful comment. We have realized that this is actually an issue that would need multilateral considerations. A previous study of one of us (Wärnberg & Kumar, 2023 PNAS) assumed that DA represents a vector error rather than a scalar RPE, and thus homogeneous DA was considered as negative control because it cannot represent vector error other than the direction of (1, 1, .., 1). In contrast, the present work assumed that DA represents a scalar RPE, and then homogeneous DA (i.e., constant feedback) would not be said as a failure mode because it can actually represent a scalar RPE and FA to the direction of (1, 1, .., 1) should in fact occur. And this FA to (1, 1, ..., 1) may actually be interesting because it means that if heterogeneity of DA inputs is not large and the feedback is not far from (1, 1, ..., 1), states are learned to be represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of not only striatal but also cortical regions (which I have been considering as an unresolved mystery). But on the other hand, the case with constant feedback is the same as the simple delta rule, as you pointed out, and then what could be obtained from the present analyses would be that FA is actually occurring behind the successful operation of such a simple rule. Anyway we will make further examinations and considerations on this issue.

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      In response to this insightful comment, we considered concrete predictions of our models. In the FA model, the feedback vector c and the value-weight vector w are initially at random (on average orthogonal) relationships and become gradually aligned, whereas in the non-negative model, the vectors c and w are loosely aligned from the beginning. We considered how the vectors c and w can be experimentally measured. Each element of the feedback vector c is multiplied with TD-RPE, modulating the degree of update in each pyramidal cell (more accurately, pyramidal cell population that corresponds to single RNN unit). Thus each element of c could be measured as the magnitude of response of each pyramidal cell to DA stimulation. The element of the value-weight vector w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell.

      Then, the abovementioned predictions can be tested by (i) identify cortical, striatal, and VTA regions that are connected by meso-cortico-limbic pathway and cortico-striatal-VTA pathway, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether the DA->pyramidal responses and the pyramidal->striatal responses are associated across pyramidal cells, and whether such associations develop through learning. We will elaborate this tentative idea, and also other ideas, in our revision.

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [https://www.nature.com/articles/s41467-020-17236-y]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task?

      In reply to this suggestion, we will explore how our results compare to the previous studies including the paper [https://www.nature.com/articles/s41467-020-17236-y], and explore benefits of our models. At preset, we think of one possible direction. According to our results (Fig. 6E), under the non-negativity constraint, the model with random feedback and monotonic plasticity rule (bioVRNNrf) performed better, on average, than the model with backprop and non-monotonic plasticity rule (revVRNNbp) when the number of units was large, though the difference in the performance was not drastic. We will explore reasons for this, and examine if this also applies to cases with more realistic models, e.g., having separate excitatory and inhibitory units (as suggested by other reviewer).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In reply to this comment and also other reviewer's comment, we will examine the performance of the different models in more complex tasks, e.g., having distractor cues or longer delays. We will also see whether or not the better performance of bioVRNNrf than revVRNNbp mentioned in the previous point applies to the different tasks.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      (7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      (7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      (7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      Thank you for your helpful suggestions. We will thoroughly revise our writings.

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      We will make considerations on whether/how the non-negative constraints could have any benefits other than biological plausibility, in particular, in theoretical aspects or applications using neuro-morphic hardware, while we will also elaborate the links to biology and concretize the model's predictions.

    1. eLife Assessment

      The current human tissue-based study provides convincing evidence correlating hippocampal expressions of RNA guanine-rich G-quadruplexes with aging and with Alzheimer's Disease presence and severity. The results are important and hold promise for deeper understanding of AD's pathogenesis and potential new therapeutic strategies.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seems robust and reproducible. There were two main things that needed addressing:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      Comments on current version:

      The authors have now addressed my concerns.

    3. Reviewer #2 (Public review):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.

      In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).

      This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue from the previous round of review:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality"). Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      Comments on current version:

      The authors have made laudable efforts to address the criticisms I made in my evaluation of the original manuscript.

    4. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seem robust and reproducible.

      In terms of the conclusions, however, I think that there are 2 main things that need addressing prior to publication:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      With what is now known about RNA rG4s and the recent reconciliation of the controversy on rG4 formation (Kharel, Nature Communications 2023), this experiment is no longer strictly required for demonstration of rG4 formation. Despite this change, we did attempt this experiment at the reviewer’s suggestion, but the controls were not successful, suggesting it may not be feasible with our fixing and staining conditions. That said, we agree that despite the G4 staining appearing primarily outside the nucleus, it would be helpful to have some direct indication of whether we were observing primarily RNA or DNA G4s, and so we performed an alternate experiment to determine this.

      In our previous submission, we had performed ribosomal RNA staining  (Figure S7), and the staining patterns were similar to that of BG4, especially the punctate pattern near the nuclei. Therefore, we directly asked whether the BG4 was largely binding to rRNA and have now shown the resulting co-stain in Figure 3b. These results show that at least a large amount of the BG4 staining does arise from rG4s in ribosomes. At high magnification, we observe that the BG4 stains a subset of the ribosomes, consistent with previous observations of high rG4 levels in ribosomes both in vitro and in cells (Mestre-Fos, 2019 J Mol Biol, Mestre-Fos 2019 PLoS One, Mestre-Fos 2020 J Biol Chem), but this had never been demonstrated in tissue. This experiment has therefore both answered the primary question of whether we are primarily observing rG4s, as well as provided more detailed information on the cellular sublocalization of rG4 formation, and provided the first evidence of rG4 formation on ribosomes in tissue.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      We agree that these are correlative studies (of necessity when studying human tissue), but recent experiments have shown that rG4s affect the aggregation of Tau in vitro – and we have now better clarified this in the text itself. We have now also been more careful in drawing causative conclusions as shown in the revised text.

      Minor point:

      (3) rG4s themselves have been shown to generate aggregates in ALS models in the absence of any protein (Ragueso et al. Nat Commun 2023). I think this is also important in the light of my comment on the model, could well be that these rG4s are causing aggregates themselves that act as nucleation point for the proteins as reported in the paper I mentioned. Providing a broader and more unbiased view of the current literature on the topic would be fair, rather than focusing on reports more in line with the model proposed.

      We agree and have modified the discussion and added a broader context, including the Ragueso report described above.

      Reviewer #1 (Significance):

      This is a significant novel study, as per my comments above. I believe that such a study will be of impact in the G4 and neurodegenerative fields. Providing that the authors can address the criticisms above, I strongly believe that this manuscript would be of value to the scientific community. The main strength is the novelty of the study (never done before) the main weakness is the lack of the RNase control at the moment and the slightly over interpretation of the findings (see comments above).

      Reviewer #2 (Evidence, reproducibility and clarity):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.  In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).  This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality”.

      We believe that we had not explained this clearly enough in the text (based on the reviewer’s comment), as the correlation mentioned by the Reviewer was for the CA4 region only, and not the OML, which was substantially more correlated and statistically significant (Spearman R= 0.72, p = 0.00086). As a result, we believe this was a miscommunication that is rectified by the revised text:

      “In the OML, plotting BG4 percent area versus Braak stage demonstrated a strong correlation (Spearman R= 0.72) with highly significantly increased BG4 staining with higher Braak stages (p = 0.00086) (Fig. 2b).”

      Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      We did not mean to imply that deleting these outliers was correct, but merely were demonstrating that they were in fact outliers. To avoid this misinterpretation, we have now deleted the sentence in the Figure 1d caption mentioning the outliers.

      Minor suggestions

      - "BG4 immunostaining was in many cases localized in the cytoplasm near the nucleus in a punctate pattern". Define "many"

      This is seen in nearly every cells and this is now altered in the text and is now identified as ribosomes containing rG4s using the rRNA antibody (Fig. 3b).

      - Specify that MABE917 corresponds to the specific single-chain version of the BG4 antibody

      Yes, this is correct, and this clarification has been added to the manuscript

      - Define PMI, Braak, CERAD (add a list of acronyms or insert these definitions in Fig 1b legend)

      These definitions have all been added when they first appear.

      - Fig 3: scale bar legend missing (50 micrometers?)

      This has been added, and the reviewer was correct that it was 50 micrometers.

      - Supplementary data Table 1: indicate target for all antibodies

      The target for each antibody has been added to supplementary Table 1.

      - Supplementary data Table 2: why give ages with different levels of precision? (e.g. 90.15 vs 63)

      We apologize for this oversight and have altered the ages to the same (whole years) in the figure.

      - Supplementary data Fig 1 X-axis legend: add "(nm)" after wavelength. Sequence can also be added in the legend. Why this one? Max/Min Wavelengths in the figure do not match indications in the experimental part. Not sure if that part is actually relevant for this study.

      The CD spectrum in Sup Fig 1 is the sequence that had previously been shown to aid in tau aggregation seeding, but had not been suspected by those authors to be a quadruplex. So we tested that here and showed it is a quadruplex, as described at the end of the introduction. We have added wording to the figure legend to clarify where its corresponding description in the main text can be found. We have also checked and corrected the wavelength and units.

      - Supplementary data Fig 7: Which ribosomal antibody was used?

      The details of this antibody have now been added to Supplementary Table 2 which lists all the antibodies used.

      Reviewer #2 (Significance):

      Provide a link between Alzheimer disease and RNA G-quadruplexes.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This study investigated the formation of RNA G quadruplexes (rG4) in aging and AD in human hippocampal postmortem tissue. The rG4 immunostaining in the hippocampus increases strongly with age and with the severity of AD. Furthermore, rG4 is present in neurons with an accumulation of phosphorylated tau immunostaining.

      Major comments

      (1) The method used in this study is primarily immunostaining of BG4, and the results cannot be considered correct without additional data from more multifaceted analyses (biochemical analysis, RNA expression analysis, etc.).

      We respectfully disagree with the Reviewer’s assessment of the value of these experiments. The most relevant biochemical experiments at the cellular and molecular level showing the role of G4s in aggregation in general and Tau in particular have been done and are referenced in the text. The results here stand on their own and are highly novel and significant, as evaluated by both of the other reviewers. There has been no previous work demonstrating the presence of rG4s in human brain – either in controls or in patients with AD. AD is a complex condition that only occurs spontaneously in the human brain and no other species; because of this complexity, novel aspects are best first studied in human brain tissue using the methods employed here.

      (2) Overall, the quality of the stained images is poor, and detailed quantitative analysis using further high quality data is essential to conclude the authors' conclusions.

      We have again looked at our images and they are not poor quality -they are confocal images taken at recommended resolution of the confocal microscope. It is possible the poor quality came from pdf compression by the manuscript submission portal, which is beyond our control as they were uploaded at high resolution. These data were quantified by scientists who were blinded to the diagnosis of each case. The level of description on the detailed quantification is higher than we have observed in similar studies. We therefore disagree with the reviewer’s conclusion.

      Reviewer #3 (Significance):

      Overall, this study is not a deeply analyzed study. In addition, the authors of this study need further understanding regarding G4.

      It is also unclear why the reviewer believes that we do not have sufficient understanding of G4s, and would request that the reviewer instead provides specific comments regarding what is lacking in terms of knowledge on G4s, as we respectfully disagree with this judgement of our knowledge-base (see other G4 papers from the Horowitz lab, Begeman, 2020, Litberg 2023, Son, 2023 referenced below).

      Litberg TJ, Sannapureddi RKR, Huang Z, Son A, Sathyamoorthy B, Horowitz S. Why are G-quadruplexes good at preventing protein aggregation? Jan;20(1):495-509. doi: 10.1080/15476286.2023.2228572. RNA Biol. (2023)

      Son A, Huizar Cabral V, Huang Z, Litberg TJ, Horowitz S. G-quadruplexes rescuing protein folding. May 16;120(20):e2216308120. doi: 10.1073/pnas.2216308120. Proc Natl Acad Sci U S A (2023)

      Guzman BB, Son A, Litberg TJ, Huang Z, Dominguez , Horowitz S. Emerging Roles for G-Quadruplexes in Proteostasis FEBS J.doi: 10.1111/febs.16608. (2022)

      Begeman A, Son A, Litberg TJ, Wroblewski TH, Gehring T, Huizar Cabral V, Bourne J, Xuan Z, Horowitz S. G-Quadruplexes Act as Sequence Dependent Protein Chaperones. EMBO Reports Sep 18;e49735. doi: 10.15252/embr.201949735. (2020)

    1. eLife Assessment

      The revised report provides valuable findings for the field, suggesting a relationship between CRF1 receptors, sociability deficits in morphine-treated male mice yet not females, and a potential mechanism involving oxytocin neurons in the paraventricular nucleus of the hypothalamus. Generally, the strength of evidence is solid in terms of the methods, data, and analyses. This work will be of interest to those interested in social behavior and addiction.

    2. Reviewer #1 (Public review):

      Summary:

      The use of antalarmin, a selective CRF1 receptor antagonist, prevents the deficits in sociability in (acutely) morphine-treated males, but not in females. In addition, cell attached experiments show a rescue to control levels of the morphine-induced increased firing in PVN neurons from morphine-treated males. Similar results are obtained in CRF receptor 1-/- male mice, confirming the involvement of CRF receptor 1-mediated signaling in both sociability deficits and neuronal firing changes in morphine-treated male mice.

      Strengths:

      In the revised version of the paper the authors respond to some reviewers's points with a new statistical analysis of behavioral data and a new discussion of previous literature.

      Weaknesses:

      Following reviewers' comments, the authors provided mechanistic insights of their findings with new experiments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The use of antalarmin, a selective CRF1 receptor antagonist, prevents the deficits in sociability in (acutely) morphine-treated males, but not in females. In addition, cell-attached experiments show a rescue to control levels of the morphine-induced increased firing in PVN neurons from morphine-treated males. Similar results are obtained in CRF receptor 1-/- male mice, confirming the involvement of CRF receptor 1-mediated signaling in both sociability deficits and neuronal firing changes in morphine-treated male mice.

      Strengths:

      The experiments and analyses appear to be performed to a high standard, and the manuscript is well written and the data clearly presented. The main finding, that CRF-receptor plays a role in sociability deficits occurring after acute morphine administration, is an important contribution to the field.

      Weaknesses:

      The link between the effect of pharmacological and genetic modulation of CRF 1 receptor on sociability and on PVN neuronal firing, is less well supported by the data presented. No evidence of causality is provided.

      Major points:

      (1) The results of behavioral tests and the neural substrate are purely correlative. To find causality would be important to selectively delete or re-express CRF1 receptor sequence in the VPN. Re-expressing the CRF1 receptor in the VPN of male mice and testing them for social behavior and for neuronal firing would be the easier step in this direction.

      We agree with this comment and have acknowledged that further studies, such as genetic or pharmacological inactivation of CRF<sub>1</sub> receptors selectively in the paraventricular nucleus of the hypothalamus (PVN), are warranted to address this issue (page 17, line 25 to page 18, line 1).

      We would also like to mention that our manuscript title intentionally presented our findings separately without implying causality. Our idea was simply to pair the behavioral data to neural activity within a network of interest, i.e., the PVN CRF-oxytocin (OXY)/arginine-vasopressin (AVP) network, which is thought to play a critical role at the interface of substance use disorders and social behavior. Accordingly, we previously reported that genetic CRF<sub>2</sub> receptor deficiency reliably eliminated sociability deficits and hypothalamic OXY and AVP expression induced by cocaine withdrawal (Morisot et al., 2018). Thus, the present manuscript reliably shows that CRF<sub>1</sub> receptor-mediated effects of acute morphine administration upon social behavior are consistently mirrored by neural activity changes within the PVN, and particularly within its OXY<sup>+</sup>/AVP<sup>+</sup> neuronal populations. In addition, we demonstrate that the latter effects are sex-linked, which is in line with previous reports of sex-biased CRF<sub>1</sub> receptor roles in rodents (Rosinger et al., 2019; Valentino et al., 2013) and humans (Roy et al., 2018; Weber et al., 2016).

      (2) It would be interesting to discuss the relationship between morphine dose and CRF1 receptor expression.

      We are not aware of studies reporting CRF<sub>1</sub> receptor expression following acute morphine administration. However, repeated heroin self-administration was shown to increase CRF<sub>1</sub> receptor expression in the ventral tegmental area (VTA). We have mentioned the latter study in the present revised version of our manuscript at page 18, lines 1-2.

      (3) It would be important to show the expression levels of CRF1 receptors in PVN neurons in controls and morphine-treated mice, both males and females.

      We agree with this reviewer comment and, in the present version of the manuscript, have mentioned that examination of CRF<sub>1</sub> receptor expression in the PVN might help to understand the brain mechanisms underlying morphine effects upon social behavior (page 18, lines 2-6). Moreover, at page 15, lines 11-19 we have mentioned studies showing higher levels of the CRF<sub>1</sub> receptor in the PVN of adult (2 months) and old (20-24 months) male mice, as compared to adult and old female mice (Rosinger et al., 2019). Thus, differences in PVN CRF<sub>1</sub> receptor expression between male and female mice might underlie the sex-linked effects of CRF<sub>1</sub> receptor antagonism by antalarmin reported in our manuscript.

      (4) It would be important to discuss the mechanisms by which CRF1 receptor controls the firing frequency of APV+/OXY+ neurons in the VPN of male mice.

      Using the in situ hybridization technique, studies reported relatively low expression of the CRF<sub>1</sub> receptor in the PVN (Van Pett et al., 2000). However, more recent studies using genetic approaches identified a substantial population of CRF<sub>1</sub> receptor-expressing neurons within the PVN (Jiang et al., 2019, 2018). These CRF<sub>1</sub> receptor-expressing neurons are believed to respond to local CRF release and likely form bidirectional connections with both CRF and OXY+/AVP+ neurons (Jiang et al., 2019, 2018). Thus, one proposed mechanism of action is that morphine increases intra-PVN release of CRF, which may act on intra-PVN CRF<sub>1</sub> receptor-expressing neurons. The latter neurons might in turn influence the activity of PVN OXY+/AVP+ neurons, which largely project to the VTA and the bed nucleus of the stria terminalis (BNST) to modulate social behavior. Within this framework, pharmacological or genetic inactivation of CRF<sub>1</sub> receptors might deregulate the activity of intra-PVN CRF-OXY/AVP interactions and thus interfere with opiate-induced social behavior deficits. In particular, the latter phenomenon might be more pronounced in male mice since they express more CRF<sub>1</sub> receptor-positive neurons in the PVN, as compared to female mice (Rosinger et al., 2019). The putative mechanisms of action described herein are also mentioned at page 16, lines 12 to page 17, line 7 of the present revised version of the manuscript.

      Minor points:

      (1) The phase of the estrous cycles in which females are analyzed for both behavior and electrophysiology should be stated.

      The normal estrous cycle of laboratory mice is 4-5 days in length, and it is divided into four phases (proestrus, estrus, metestrus and diestrus). The three-chamber experiments were generally carried out over a 5-day period, thus spanning across the entire estrous cycle. In particular, on each test day approximately the same number of mice was assigned to each experimental group. Thus, within each group the number of female mice tested on each phase of the estrous cycle was likely similar. Moreover, except for firing frequency displayed by vehicle/morphine-treated mice, female and male mice showed similar results variability, indicating a marginal role for the estrous cycle in the spread of data. We would also like to mention relatively recent studies indicating no significant difference over different phases of the estrous cycle in the social interaction test as well as in anxiety-like and anhedonia-like behavioral tests in C57BL/6J female mice (Zhao et al., 2021). Accordingly, similar findings were also reported by other authors who found no difference across the diestrus and estrus phases of the estrous cycle in C57BL/6J female mice tested in behavioral assays of anxiety-like, depression-like and social interaction (Zeng et al., 2023).

      A paragraph has been added to page 20, lines 1-9 of the present version of the manuscript to explain why we did not monitor the estrous cycle in female mice.

      (2) It would be important to show the statistical analysis between sexes.

      Following this reviewer comment, we examined the sociability ratio results by a three-way ANOVA with sex (males vs. females), pretreatment (vehicle vs. antalarmin) and treatment (saline vs. morphine) as between-subjects factors. The latter analysis revealed an almost significant sex X pretreatment X treatment interaction effect (F<sub>1,53</sub>=3.287, P=0.075), which could not allow for post-hoc individual group comparisons. Nevertheless, Newman-Keuls post-hoc comparisons revealed that male mice treated with antalarmin/morphine showed higher sociability ratio than female mice treated with antalarmin/morphine (P<0.05). The latter statistical results have been added to the present revised version of the manuscript at page 7, lines 2-8.

      We also examined neuronal firing frequency by a three-way ANOVA with sex (males vs. females), pretreatment (vehicle vs. antalarmin) and treatment (saline vs. morphine) as between-subjects factors. Analysis of firing frequency of all of the recorded cells in C57BL/6J mice revealed a sex X pretreatment X treatment interaction effect (F<sub>1,195</sub>=4.765, P<0.05). Newman-Keuls post-hoc individual group comparisons revealed that male mice treated with vehicle/morphine showed higher firing frequency than all other male and female groups (P<0.0005). Moreover, male mice treated with antalarmin/morphine showed lower firing frequency than male mice treated with vehicle/morphine (P<0.0005). In net contrast, female mice treated with antalarmin/morphine did not differ from female mice treated with vehicle/morphine (P=0.914). The latter statistical results have been added to the present revised version of the manuscript at page 8, lines 4-12. Finally, similar results were obtained following the three-way ANOVA (sex X pretreatment X treatment) of firing frequency recorded in the subset of neurons co-expressing OXY and AVP (data not shown).

      Thus, sex-linked responses to morphine were detected also by three-way ANOVAs including sex as a variable. However, in the revised version of the manuscript we did not include novel figures combining the two sexes because it would have been largely redundant with the figures already reported, especially with Fig. 1D, Fig. 1G, Fig. 2B and Fig. 2D.

      Reviewer #2 (Public review):

      This manuscript reports a series of studies that sought to identify a biological basis for morphine-induced social deficits. This goal has important translational implications and is, at present, incompletely understood in the field. The extant literature points to changes in periventricular CRF and oxytocin neurons as critical substrates for morphine to alter social behavior. The experiments utilize mice, administered morphine prior to a sociability assay. Both male and female mice show reduced sociability in this procedure. Pretreatment with the CRF1 receptor antagonist, antalarmin, clearly abolished the morphine effect in males, and the data are compelling. Consistently, CRF1-/- male mice appeared to be spared of the effect of morphine (while wild-type and het mice had reduced sociability). The same experiment was reported as non-feasible in females due to the effect of dose on exploratory behavior per se. Seeking a neural correlate of the behavioral pharmacology, acute cell-attached recordings of PVN neurons were made in acute slices from mice pretreated with morphine or anatalarmin. Morphine increased firing frequencies, and both antalarmin and CRF1-/- mice were spared of this effect. Increasing confidence that this is a CRF1 mediated effect, there is a gene deletion dose effect where het's had an intermediate response to morphine. In general, these experiments are well-designed and sufficiently powered to support the authors' inferences. A final experiment repeated the cell-attached recordings with later immunohistochemical verification of the recorded cells as oxytocin or vasopressin positive. Here the data are more nuanced. The majority of sampled cells were positive for both oxytocin and vasopressin, in cells obtained from males, morphine pretreatment increased firing in this population and was CRF1 dependent, however in females the effect of morphine was more modest without sensitivity to CRF1. Given that only ~8 cells were only immunoreactive for oxytocin, it may be premature to attribute the changes in behavior and physiology strictly to oxytocinergic neurons.

      In sum, the data provide convincing behavioral pharmacological evidence and a regional (and possibly cellular) correlation of these effects suggesting that morphine leads to sociality deficits via CRF interacting with oxytocin in the hypothalamus. While this hypothesis remains plausible, the current data do not go so far as directly testing this mechanism in a site or cell-specific way.

      We agree with this reviewer’s comment and acknowledge that further studies are needed to better understand the neural substrates of CRF<sub>1</sub> receptor-mediated sociability deficits induced by morphine. This has been mentioned at page 17, line 25 to page 18, line 6 of the present revised version of the manuscript.

      With regard to the presentation of these data and their interpretation, the manuscript does not sufficiently draw a clear link between mu-opioid receptors, their action on CRF neurons of the PVN, and the synaptic connectivity to oxytocin neurons. Importantly, sex, cell, and site-specific variations in the CRF are well established (see Valentino & Bangasser) yet these are not reviewed nor are hypotheses regarding sex differences articulated at the outset. The manuscript would have more impact on the field if the implications of the sex-specific effects evident here were incorporated into a larger literature.

      At page 15, line 19 to page 16, line 2 of the present version of the manuscript, we have mentioned prior studies reporting differences in CRF<sub>1</sub> receptor signaling or cellular compartmentalization between male and female rodents (Bangasser et al., 2013, 2010). However, the latter studies were conducted in cortical or locus coeruleus brain tissues. Thus, more studies are needed to examine CRF<sub>1</sub> receptor signaling or cellular compartmentalization in the PVN and their relationship to the sex-linked results reported in our manuscript.

      With regards to the model proposed in the discussion, it seems that there is an assumption that ip morphine or antalarmin have specific effects on the PVN and that these mediate behavior - but this is impossible to assume and there are many meaningful alternatives (for example, both MOR and CRF modulation of the raphe or accumbens are worth exploration).

      We focused our discussion on PVN OXY/AVP systems because ourelectrophysiology studies examined neurons expressing OXY and/or AVP in this brain area. However, we understand that other brain areas/systems might mediate the effect of systemic administration of the CRF<sub>1</sub> receptor antagonist antalarmin or whole-body genetic disruption of the CRF<sub>1</sub> receptor upon morphine-induced social behavior deficits. For this reason, at page 16, line 12 to page 17, line 7 of the present version of the manuscript we have mentioned the possible involvement of BNST OXY or VTA dopamine systems in the CRF<sub>1</sub> receptor-mediated social behavior effects of morphine reported herein. Indeed, literature suggests important CRF-OXY and CRF-dopamine interactions in the BNST and the VTA, which might be relevant to the expression of social behavior. Nevertheless, to date the implication of the latter brain systems interactions in social behavior alterations induced by substances of abuse remains to be elucidated.

      While it is up to the authors to conduct additional studies, a demonstration that the physiology findings are in fact specific to the PVN would greatly increase confidence that the pharmacology is localized here. Similarly, direct infusion of antalarmin to the PVN, or cell-specific manipulation of OT neurons (OT-cre mice with inhibitory dreadds) combined with morphine pre-exposure would really tie the correlative data together for a strong mechanistic interpretation.

      We agree with this reviewer’s comment that the suggested experiments would greatly increase the understanding of the brain mechanisms underlying the social behavior deficits induced by opiate substances. We have acknowledged this at page 17, line 25 to page 18, line 6.

      Because the work is framed as informing a clinical problem, the discussion might have increased impact if the authors describe how the acute effects of CRF1 antagonists and morphine might change as a result of repeated use or withdrawal.

      Prior studies reported behavioral and neuroendocrine (hypothalamus-pituitary-adrenal axis) effects of chronic systemic administration of CRF<sub>1</sub> receptor antagonists, such as R121919 and antalarmin (Ayala et al., 2004; Dong et al., 2018). However, to our knowledge, no studies have directly compared the behavioral effects of acute vs. repeated administration of CRF<sub>1</sub> receptor antagonists. We previously reported that acute administration of antalarmin increased the expression of somatic opiate withdrawal in mice, indicating that this compound is effective following withdrawal from repeated morphine administration (Papaleo et al., 2007). Nevertheless, further studies are needed to specifically address this reviewer’s comment.

      Reviewer #3 (Public review):

      Summary:

      In the current manuscript, Piccin et al. identify a role for CRF type 1 receptors in morphine-induced social deficits using a 3-chamber social interaction task in mice. They demonstrate that pre-treatment with a CRFR1 antagonist blocks morphine-induced social deficits in male, but not female, mice, and this is associated with the CRF R1 antagonist blocking morphine-induced increases in PVN neuronal excitability in male but not female mice. They followed up by using a transgenic mouse CRFR1 knockout mouse line. CRFR1 genetic deletion also blocked morphine-induced social deficits, similar to the pharmacological approach, in male mice. This was also associated with morphine-induced increases in PVN neuronal excitability being blocked in CRFR1 knockout mice. Interestingly they found that the pharmacological antagonism of the CRFR1 specifically blocked morphine-induced increases in oxytocin/AVP neurons in the PVN in male mice.

      Strengths:

      The authors used both male and female mice where possible and the studies were fairly well controlled. The authors provided sufficient methodological detail and detailed statistical information. They also examined measures of locomotion in all of the behavioral tasks to separate changes in sociability from overall changes in locomotion. The experiments were well thought out and well controlled. The use of both the pharmacological and genetic approaches provides converging lines of evidence for the role of CRFR1 in morphine-induced social deficits. Additionally, they have identified the PVN as a potential site of action for these CRFR1 effects.

      Weaknesses:

      While the authors included both sexes they analyzed them independently. This was done for simplicity's sake as they have multiple measures but there are several measures where the number of factors is reduced and the inclusion of sex as a factor would be possible.

      Please, see above our response to the same comment made by Reviewer 1.

      Additionally, single doses of both the CRFR1 antagonist and morphine are used within an experiment without justification for the doses. In fact, a lower dose of morphine was needed for the genetic CRFR1 mouse line. This would suggest that the dose of morphine being used is likely causing some aversion that may be more present in the females, as they have lower overall time in the ROI areas of both the object and the mouse following morphine exposure.

      The morphine dose was chosen based on our prior study showing that morphine (2.5 mg/kg) impaired sociability in male and female C57BL/6J mice, without affecting locomotor activity (Piccin et al., 2022). Also, the antalarmin dose (20 mg/kg) and the route of administration (per os) was chosen based on our prior studies demonstrating behavioral effects of this CRF<sub>1</sub> receptor antagonist administered per os (Contarino et al., 2017; Ingallinesi et al., 2012; Piccin and Contarino, 2020). This is now mentioned in the “materials and methods” section of the present revised version of the manuscript at page 23, lines 6-13. We also agree with this reviewer that female mice seemed more sensitive to morphine than male mice. Indeed, during the habituation phase of the three-chamber test female mice treated with morphine (2.5 mg/kg) spent less time in the ROIs containing the empty wire cages, as compared to saline-treated female mice (Fig. 1E). However, morphine did not affect locomotor activity in female mice (Fig. S1B), suggesting independency between social approach and ambulation.

      As for the discussion, the authors do not sufficiently address why CRFR1 has an effect in males but not females and what might be driving that difference, or why male and female mice have different distribution of PVN cell types during the recordings.

      At page 15, line 11 to page 16, line 2, we have mentioned possible mechanisms that might underlie the sex-linked results reported in our manuscript. Moreover, at page 16, lines 6-9 we have mentioned a seminal review reporting sex-linked expression of PVN OXY and AVP in a variety of animal species that is similar to the present results. Nevertheless, as mentioned in the “discussion” section, further studies are needed to elucidate the neural substrates underlying sex-linked effects of opiate substances upon social behavior.

      Additionally, the authors attribute their effect to CRF and CRFR1 within the PVN but do not consider the role of extrahypothalamic CRF and CRFR1. While the PVN does contain the largest density of CRF neurons there are other CRF neurons, notably in the central amygdala and BNST, that have been shown to play important roles in the impact of stress on drug-related behavior. This also holds true for the expression of CRFR1 in other regions of the brain, including the VTA, which is important for drug-related behavior and social behavior. The treatments used in the current manuscript were systemic or brain-wide deletion of CRFR1. Therefore, the authors should consider that the effects could be outside the PVN.

      Even if they suggest a role for PVN CRF<sub>1</sub>-OXY circuits, we are aware that the present data do not support a direct link between behavior and PVN CRF<sub>1</sub> receptors. Thus, at page 16, line 12 to page 17, line 7 of the present version of the manuscript we have mentioned some studies showing a role for PVN OXY, BNST OXY or VTA dopamine systems in social behavior. Interestingly, the latter brain systems are thought to interact with the CRF system. However, more studies are warranted to understand the implication of CRF-OXY or CRF-dopamine interactions in social behavior deficits induced by substances of abuse.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I commend the authors on crafting a well-written and clear manuscript with excellent figures. Furthermore, the data analysis and rigor are quite high. I have a few suggestions in the order they appear in the manuscript:

      The introduction has a number of abrupt transitions. For example, the sentence beginning with "Besides," in paragraph 2 jumps from CRF to oxytocin and vasopressin without a transition or justification. In all, vasopressin may be better removed from the introduction. There is sufficient evidence in the literature to support the CRF-OT circuit that might mediate behavioral pharmacology and this should be clearly described in the introduction.

      We have added a sentence at page 3, lines 22-23 to introduce possible interactions of the CRF system with other brain systems implicated in social behavior. Also, in the “introduction” section both OXY and AVP systems are mentioned because our electrophysiology studies examined the effect of morphine upon the activity of OXY- and AVP-positive neurons.

      Our interest in the PVN CRF-OXY/AVP network also stems from previous findings from our laboratory showing that genetic inactivation of the CRF<sub>2</sub> receptor eliminated both sociability deficits and increased hypothalamic OXY and AVP expression associated with long-term cocaine withdrawal in male mice (Morisot et al., 2018). Moreover, evidence suggests the implication of AVP systems in opiate effects. In particular, pharmacological antagonism of AVP-V1b receptors decreased the acquisition of morphine-induced conditioned place preference in male C57BL/6N mice housed with morphine-treated mice (Bates et al., 2018).

      Throughout the manuscript, it seems that there is an assumption that ip morphine or antalarmin have specific effects on the PVN and that these mediate behavior - this is impossible to assume and there are many meaningful alternatives (for example, both MOR and CRF modulation of the raphe or accumbens are worth exploration). While it is up to the authors to conduct additional studies, a demonstration that the physiology findings are in fact specific to the PVN would greatly increase confidence that the pharmacology is localized here. Similarly, direct infusion of antalarmin to the PVN, or cell-specific manipulation of OT neurons (OT-cre mice with inhibitory dreadds) combined with morphine pre-exposure would really tie the correlative data together for a strong mechanistic interpretation.

      We agree that the suggested experiments would greatly increase the understanding of the brain mechanisms underlying the social behavior deficits induced by opiate substances. This has been acknowledged at page 17, line 25 to page 18, line 6 of the present version of the manuscript.

      Also in the introduction, the reference to shank3b mice is not the most direct evidence of oxytocin involvement in sociability. It may be helpful to point reviewers to studies with direct manipulation of these populations (Grinevich group, for example).

      At page 4, lines 4-6 of the “introduction” section, we have added a sentence to mention a seminal paper by the Grinevich group demonstrating an important role for OXY-expressing PVN parvocellular neurons in social behavior (Tang et al., 2020). Moreover, at page 4, lines 8-10 we have mentioned a recent study showing that targeted chemogenetic silencing of PVN OXY neurons in male rats impaired short- and long-term social recognition memory (Thirtamara Rajamani et al., 2024).

      It would be helpful in the figures to indicate which panels contain male or female data.

      The sex of the mice is mentioned above each panel of the main and supplemental figures, except for the studies with CRF<sub>1</sub> receptor-deficient mice wherein only experiments carried out with male mice were illustrated. In the latter case, the sex (male) of the mice is mentioned in the related legend.

      The discussion itself departs from the central data in a few ways - the passages suggesting that morphine produces a stress response and that CRF1 antagonists would block the stress state are highly speculative (although testable). The manuscript would have more impact if the sex-specific effects and alternative hypotheses were enhanced in the discussion.

      At page 16, line 12 to page 17, line 7 of the “discussion” section, we have suggested that interaction of the CRF system with other brain systems implicated in social behavior (i.e., OXY, dopamine) might underlie the sex-linked CR<sub>1</sub> receptor-mediated effects of morphine reported in our manuscript. Also, at page 15, line 19 to page 16, line 2 we have mentioned studies showing sex-linked CRF<sub>1</sub> receptor signaling and cellular compartmentalization that might be relevant to the present findings. Finally, to further support the notion of morphine-induced PVN CRF activity, at page 15, lines 4-6 we have mentioned a study suggesting that activation of presynaptic mu-opioid receptors located on PVN GABA terminals might reduce GABA release (and related inhibitory effects) onto PVN CRF neurons (Wamsteeker Cusulin et al., 2013). Nevertheless, we believe that more work is needed to better understand the role for the CRF<sub>1</sub> receptor in opiate-induced stress responses and activity of OXY and dopamine systems implicated in social behavior.

      Reviewer #3 (Recommendations for the authors):

      (1) You should provide justification for the doses selected for treatments and the route of administration for the CRFR1 antagonist, especially for females.

      This has been added at page 23, lines 6-13 of the present version of the manuscript. In particular, the doses and routes of administration for morphine and antalarmin used in the present study were chosen based on previous work from our laboratory. Indeed, the intraperitoneal administration of morphine (2.5 mg/kg) impaired social behavior in male and female mice, without affecting locomotor activity (Piccin et al., 2022). Moreover, the oral route of administration for antalarmin was chosen for its translational relevance, as it could be easily employed in clinical trials assessing the therapeutic value of pharmacological CRF<sub>1</sub> receptor antagonists.

      (2) For the electrophysiology data you should include the number of cells per animal that were obtained. It appears that fewer cells from more females were obtained than in males and so the distribution of individual animals to the overall variance may be different between males and females.

      The number of cells examined and animals used in the electrophysiology experiments are reported above each panel of the related Figures 2, 3 and 4 as well as in the supplementary tables S1B and S1C. Overall, the number of cells examined in male and female mice was quite similar. Also, the number of male and female mice used was comparable. Standard errors of the mean (SEM) were quite similar across the different male and female groups (Fig. 2B and 2D), except for vehicle/morphine-treated male mice. Indeed, in the latter group a considerable number of cells displayed elevated firing responses to morphine, which accounted for the higher spread of the data. Accordingly, as mentioned above, the three-way ANOVA with sex (males vs. females), pretreatment (vehicle vs. antalarmin) and treatment (saline vs. morphine) as between-subjects factors revealed that male mice treated with vehicle/morphine showed higher firing frequency than all other male and female groups (P<0.0005). Finally, a similar pattern of firing frequency was observed also in neurons co-expressing OXY and AVP, wherein vehicle/morphine-treated male mice displayed higher SEM, as compared to all other male and female groups (Fig. 4C and 4F). Thus, except for vehicle/morphine-treated mice, distribution of the firing frequency data did not seem to be linked to the sex of the animal.

      (3) You should consider using a nested analysis for the slice electrophysiology data as that is more appropriate.

      We thank the reviewer for this suggestion. However, after careful consideration, we have decided to keep the current statistical analyses. In particular, given the relatively low variability of our data, we believe that the use of parametric ANOVA tests is appropriate. Moreover, additional details supporting our choice are provided just above in our response to the comment #2.

      (4) While it makes sense to not want to directly compare male and female data that results in needing to run a 4-way ANOVA, there are many measures, such as sociability, firing rate, etc., that if including sex as a factor would result in running a 3-way ANOVA and would allow for direct comparison of male and female mice.

      Please, see above our response to the same comment made by Reviewer 1. Notably, the results of our new statistical analyses including sex as a variable further support sex-linked effects of the CRF<sub>1</sub> receptor antagonist antalarmin upon morphine-induced sociability deficits and PVN neuronal firing. Nevertheless, we would like to keep the figures illustrating our findings as they are since it easily allows detecting the observed sex-linked results. Finally, we hope that this reviewer agrees with our choice, which is consistent with the wording of the title (i.e., “in male mice”).

      (5) There are grammatical and phrasing issues throughout the manuscript and the manuscript would benefit from additional thorough editing.

      We appreciate this reviewer’s feedback. Thus, upon revising, we have carefully edited the manuscript with regard to possible grammatical and phrasing errors. We hope that our changes have made the manuscript clearer in order to facilitate readability by the audience.

      (6) The discussion should be edited to include consideration of an explanation for the presence of the effect in male, but not female, mice more clearly. The discussion should also include some discussion as to why the distribution of cell types used in the electrophysiology recordings was different between males and females and whether the distribution of CRFR1 is different between males and females. Lastly, the authors need to include consideration of extrahypothalamic CRF and CRFR1 as a possible explanation for their effects. While they have PVN neuron recordings, the treatments that they used are brain-wide and therefore the possibility that the critical actions of CRFR1 could be outside the PVN.

      At page 15, line 11 to page 16, line 2 of the “discussion” section, we have suggested several mechanisms that might underlie the sex-linked behavioral and brain effects of CR<sub>1</sub> receptor antagonism reported in our manuscript. With regard to the distribution of cell types examined in the electrophysiology studies, at page 16, lines 6-9 we have mentioned a seminal review reporting sex-linked expression of PVN OXY and AVP in a variety of animal species that is similar to our results. Moreover, at page 18, lines 2-6 we mentioned that more studies are needed to examine PVN CRF<sub>1</sub> receptor expression in male and female animals, an issue that is still poorly understood. Finally, at page 16, line 12 to page 17, line 7 of the “discussion” section we also suggest that CRF<sub>1</sub> receptor-expressing brain areas other than the PVN, such as the BNST or the VTA, might contribute to the sex-linked effects of morphine reported in our manuscript. Thus, in agreement with this reviewer’s suggestion, in the present version of the manuscript we have further emphasized the possible implication of CRF<sub>1</sub> receptor-expressing extrahypothalamic brain areas in social behavior deficits induced by opiate substances.

      References

      Ayala AR, Pushkas J, Higley JD, Ronsaville D, Gold PW, Chrousos GP, Pacak K, Calis KA, Gerald M, Lindell S, Rice KC, Cizza G. 2004. Behavioral, adrenal, and sympathetic responses to long-term administration of an oral corticotropin-releasing hormone receptor antagonist in a primate stress paradigm. J Clin Endocrinol Metab 89:5729–5737. doi:10.1210/jc.2003-032170

      Bangasser DA, Curtis A, Reyes B a. S, Bethea TT, Parastatidis I, Ischiropoulos H, Van Bockstaele EJ, Valentino RJ. 2010. Sex differences in corticotropin-releasing factor receptor signaling and trafficking: potential role in female vulnerability to stress-related psychopathology. Mol Psychiatry 15:877, 896–904. doi:10.1038/mp.2010.66

      Bangasser DA, Reyes B a. S, Piel D, Garachh V, Zhang X-Y, Plona ZM, Van Bockstaele EJ, Beck SG, Valentino RJ. 2013. Increased vulnerability of the brain norepinephrine system of females to corticotropin-releasing factor overexpression. Mol Psychiatry 18:166–173. doi:10.1038/mp.2012.24

      Bates MLS, Hofford RS, Emery MA, Wellman PJ, Eitan S. 2018. The role of the vasopressin system and dopamine D1 receptors in the effects of social housing condition on morphine reward. Drug Alcohol Depend 188:113–118. doi:10.1016/j.drugalcdep.2018.03.021

      Contarino A, Kitchener P, Vallée M, Papaleo F, Piazza P-V. 2017. CRF1 receptor-deficiency increases cocaine reward. Neuropharmacology 117:41–48. doi:10.1016/j.neuropharm.2017.01.024

      Dong H, Keegan JM, Hong E, Gallardo C, Montalvo-Ortiz J, Wang B, Rice KC, Csernansky J. 2018. Corticotrophin releasing factor receptor 1 antagonists prevent chronic stress-induced behavioral changes and synapse loss in aged rats. Psychoneuroendocrinology 90:92–101. doi:10.1016/j.psyneuen.2018.02.013

      Ingallinesi M, Rouibi K, Le Moine C, Papaleo F, Contarino A. 2012. CRF2 receptor-deficiency eliminates opiate withdrawal distress without impairing stress coping. Mol Psychiatry 17:1283–1294. doi:10.1038/mp.2011.119

      Jiang Z, Rajamanickam S, Justice NJ. 2019. CRF signaling between neurons in the paraventricular nucleus of the hypothalamus (PVN) coordinates stress responses. Neurobiol Stress 11:100192. doi:10.1016/j.ynstr.2019.100192

      Jiang Z, Rajamanickam S, Justice NJ. 2018. Local Corticotropin-Releasing Factor Signaling in the Hypothalamic Paraventricular Nucleus. J Neurosci 38:1874–1890. doi:10.1523/JNEUROSCI.1492-17.2017

      Morisot N, Monier R, Le Moine C, Millan MJ, Contarino A. 2018. Corticotropin-releasing factor receptor 2-deficiency eliminates social behaviour deficits and vulnerability induced by cocaine. Br J Pharmacol 175:1504–1518. doi:10.1111/bph.14159

      Papaleo F, Kitchener P, Contarino A. 2007. Disruption of the CRF/CRF1 receptor stress system exacerbates the somatic signs of opiate withdrawal. Neuron 53:577–589. doi:10.1016/j.neuron.2007.01.022

      Piccin A, Contarino A. 2020. Sex-linked roles of the CRF1 and the CRF2 receptor in social behavior. J Neurosci Res 98:1561–1574. doi:10.1002/jnr.24629

      Piccin A, Courtand G, Contarino A. 2022. Morphine reduces the interest for natural rewards. Psychopharmacology (Berl) 239:2407–2419. doi:10.1007/s00213-022-06131-7

      Rosinger ZJ, Jacobskind JS, De Guzman RM, Justice NJ, Zuloaga DG. 2019. A sexually dimorphic distribution of corticotropin-releasing factor receptor 1 in the paraventricular hypothalamus. Neuroscience 409:195–203. doi:10.1016/j.neuroscience.2019.04.045

      Roy A, Laas K, Kurrikoff T, Reif A, Veidebaum T, Lesch K-P, Harro J. 2018. Family environment interacts with CRHR1 rs17689918 to predict mental health and behavioral outcomes. Prog Neuropsychopharmacol Biol Psychiatry 86:45–51. doi:10.1016/j.pnpbp.2018.05.004

      Tang Y, Benusiglio D, Lefevre A, Hilfiger L, Althammer F, Bludau A, Hagiwara D, Baudon A, Darbon P, Schimmer J, Kirchner MK, Roy RK, Wang S, Eliava M, Wagner S, Oberhuber M, Conzelmann KK, Schwarz M, Stern JE, Leng G, Neumann ID, Charlet A, Grinevich V. 2020. Social touch promotes interfemale communication via activation of parvocellular oxytocin neurons. Nat Neurosci 23:1125–1137. doi:10.1038/s41593-020-0674-y

      Thirtamara Rajamani K, Barbier M, Lefevre A, Niblo K, Cordero N, Netser S, Grinevich V, Wagner S, Harony-Nicolas H. 2024. Oxytocin activity in the paraventricular and supramammillary nuclei of the hypothalamus is essential for social recognition memory in rats. Mol Psychiatry 29:412–424. doi:10.1038/s41380-023-02336-0

      Valentino RJ, Van Bockstaele E, Bangasser D. 2013. Sex-specific cell signaling: the corticotropin-releasing factor receptor model. Trends Pharmacol Sci 34:437–444. doi:10.1016/j.tips.2013.06.004

      Van Pett K, Viau V, Bittencourt JC, Chan RK, Li HY, Arias C, Prins GS, Perrin M, Vale W, Sawchenko PE. 2000. Distribution of mRNAs encoding CRF receptors in brain and pituitary of rat and mouse. J Comp Neurol 428:191–212. doi:10.1002/1096-9861(20001211)428:2<191::aid-cne1>3.0.co;2-u

      Wamsteeker Cusulin JI, Füzesi T, Inoue W, Bains JS. 2013. Glucocorticoid feedback uncovers retrograde opioid signaling at hypothalamic synapses. Nat Neurosci 16:596–604. doi:10.1038/nn.3374

      Weber H, Richter J, Straube B, Lueken U, Domschke K, Schartner C, Klauke B, Baumann C, Pané-Farré C, Jacob CP, Scholz C-J, Zwanzger P, Lang T, Fehm L, Jansen A, Konrad C, Fydrich T, Wittmann A, Pfleiderer B, Ströhle A, Gerlach AL, Alpers GW, Arolt V, Pauli P, Wittchen H-U, Kent L, Hamm A, Kircher T, Deckert J, Reif A. 2016. Allelic variation in CRHR1 predisposes to panic disorder: evidence for biased fear processing. Mol Psychiatry 21:813–822. doi:10.1038/mp.2015.125

      Zeng P-Y, Tsai Y-H, Lee C-L, Ma Y-K, Kuo T-H. 2023. Minimal influence of estrous cycle on studies of female mouse behaviors. Front Mol Neurosci 16:1146109. doi:10.3389/fnmol.2023.1146109

      Zhao W, Li Q, Ma Y, Wang Z, Fan B, Zhai X, Hu M, Wang Q, Zhang M, Zhang C, Qin Y, Sha S, Gan Z, Ye F, Xia Y, Zhang G, Yang L, Zou S, Xu Z, Xia S, Yu Y, Abdul M, Yang J-X, Cao J-L, Zhou F, Zhang H. 2021. Behaviors Related to Psychiatric Disorders and Pain Perception in C57BL/6J Mice During Different Phases of Estrous Cycle. Front Neurosci 15:650793. doi:10.3389/fnins.2021.650793

    1. eLife Assessment

      This fundamental study illuminates the dynamics of BRAF in its monomeric and dimeric forms, both in the absence and presence of inhibitors, through a convincing combination of traditional experiments and sophisticated computational analyses. By revealing novel insights into the selectivity and cooperative processes of BRAF inhibitors, it holds significant promise for the development of future therapeutics, particularly against mutant isoforms in cancer. Overall, these findings will be of great interest to structural biologists, medicinal chemists, and pharmacologists.

    2. Reviewer #1 (Public Review):

      Summary:

      This manuscript from Clayton and co-authors aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microseconds MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped identify a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      Strengths:

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidines protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors are able to stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors employ molecular dynamics simulations to understand the selectivity of FDA approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signalling and the development of future BRAF inhibitor drugs.

      Strengths:

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (All-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Comment 1: This manuscript from Clayton and co-authors, entitled ”Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors”, aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first-generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microsecond MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped in identifying a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidine protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors can stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      We thank the reviewer for the positive evaluation of our work.

      Comment 2a: Regarding the analyses of the mixed state simulations, the DFG dihedral probability densities for the apo protomer (Fig. 5a right) are highly overlapping. It is not convincing that a slight shift can support the conclusion that the binding in one protomer is enough to shift the DFG motif outward allosterically. Moreover, the DFG dihedral time-series for the apo protomer (Supplementary Figure 9) clearly shows that the measured quantities are affected by significant fluctuations and poor consistency between the three replicates. The apo protomer of the mixed state simulations could be affected by the same problem that the authors pointed out in the case of the apo dimer simulations, where the amount of sampling is insufficient to model the DFG-out/-in transition properly.

      While the reviewer is correct there are large fluctuations in the DFG pseudo dihedral over the course of the apo simulations, these fluctuations occur primarily in the first 2 µs of the simulations, which were removed from our analysis. The reviewer is also correct that these simulations do not sufficiently model the DFG-out/-in transition; however, a full transition is not necessary for our analysis, as we are only interested in the shift of the DFG pseudo dihedral. As to the reviewer’s comment on the overlapping DFG distributions, we agree that the difference is very subtle. We revised the text.

      On page 9, second paragraph from the bottom:

      “While PHI1 or LY binding clearly perturbs the αC helix of the opposite apo protomer, the effect on the DFG conformation is less clear when comparing the DFG dihedral distribution of the the apo protomer in the PHI1 or LY-mixed dimer with that of the apo dimer (blue, orange, and grey, Figure 5a right). All three distributions are broad, covering a range of 160-330°. It appears that, relative to the apo dimer, the DFG of the apo protomer in the PHI1-mixed dimer is slightly shifted to the right, whereas that of the LY-mixed dimer is slightly shifted to the left; however, these differences are very subtle and warrant further investigation in future studies.”

      Comment 2b: There is similar concern with the Lys483-Glu501 salt bridge measured for the apo protomers of the mixed simulations. As it can be observed from the probabilities bar plot (Fig. 5a middle), the standard deviation is too high to support a significant role for this interaction in the allosteric modulation of the apo protomer.

      As for the salt bridge, the fluctuation in the apo dimer and LY-mixed dimer is indeed large, and together with the lower average probability suggests that the salt bridge is weaker, which is consistent with the αC helix moving outward. To clarify this, we revised the text.

      On page 9, second paragraph from the bottom:

      “Consistent with the inward shift of the αC helix, the Glu501–Lys483 salt bridge has a lower average probability and a larger fluctuation in the apo dimer and the apo protomer of the LY-mixed dimer, as compared to the apo protomer of the PHI1-mixed dimer.”

      Reviewer #2 (Public review):

      Comment 1: The authors employ molecular dynamics simulations to understand the selectivity of FDA approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signalling and the development of future BRAF inhibitor drugs.

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (all-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      We thank the reviewer for the positive evaluation of our work.

      Comment 2: Despite the use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems, the authors could consider adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

      As mentioned in our previous response, current free energy methods are capable of giving accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet. Thus, we decided not to pursue the calculations.

      Reviewer #1 (Recommendations to author):

      Comment 1: It would be useful to cite all supplementary figures in the main text (where relevant). In the present version, only Supplementary Figures 2,3, and 4 are cited in the main text.

      This was an oversight; supplementary figures 5 through 9 are now cited in the text, to point to the time-series of the quantity discussed. We note that supplementary figures 10 and 11 show the time-series of the root mean squared deviation (RMSD) of each protomer in both all monomeric and dimeric simulations; these quantities are not discussed in the manuscript but are provided for further insight.

      Comment 2: It is unclear whether the present data could support a direct involvement of the DFG movement in the allosteric mechanism proposed. The same argument applies to the Lys483Glu501 interaction in the apo protomer of the mixed state simulations. The current simulation data could only support a different stabilization of the αC-helix position. The authors should either remove/tone down the claim or extend the simulations to sample a ”converged” distribution of the DFG dihedral and the Lys483-Glu501 salt bridge of the apo protomers.

      We agree that the DFG change in the apo protomer of the PH1-mixed dimer is very subtle (see our response and revision to comment 2); however, the allosteric involvement of DFG is clearly demonstrated in Figure 5 (right panel in 5a and 5b). We compare three states: apo protomer in the mixed dimer, PHI1-bound protomer in the mixed dimer, and holo dimer (i.e., with two PHI1) Binding of the first PHI1 restricts the DFG conformation to the larger DFG dihedrals (blue curves in the top and bottom right panels). This effect (DFG outward and more restricted) is even strong when the second PHI1 binds, locking the DFG in both protomers to a narrow dihedral range 270–330 degree (green and blue curves in Figure 5b, right panel). These are allosteric effects, demonstrating that the second PH1 binding induces conformational change of the DFG in the first protomer. This is why in Figure 6, the DFG of the PHI1-bound protomer in the mixed dimer is labeled as “almost out”, while the DFG in the holo dimer is labeled as “fully out”.

      The effect of second PHI1 on the DFG of the first protomer is consistent with that the αC helix position, in which case, the second PH1 induces an inward movement of the αC of the first protomer (illustrated as “fully in” in the schematic Figure 6). Through the aC movement, the salt-bridge strength is affected, as we discussed in our response and revision to Reviewer’s comment 2a. To clarify these points, we revised the discussion of Figure 5. We made the x axis range of the DFG dihedral distributions the same between the top and bottom panels in Figure 5. To remove the claim of priming effect on DFG, we revised Figure 6.

      Page 10, Figure 5:

      we made the x axis range of the DFG dihedral distributions on the top and bottom panels the same to facilitate comparison.

      Page 11, second and third paragraphs:

      “Consistent with the change in the DFG conformation between the holo (two inhibitor) and apo dimers (Figure 3c,3f), DFG is rigidified upon binding of the first inhibitor, as evident from the narrower DFG dihedral distribution of the PHI1 or LY-bound protomer in the mixed protomer (Figure 5b right) compared to the apo protomer in the mixed dimer (Figure 5a right). Importantly, the DFG dihedral is right shifted in the occupied vs. apo protomer, demonstrating that the inhibitor pushes the DFG outward.”

      “Consistent with the effect of the second PHI1 on the αC position of the first PHI1-bound protomer, binding of the second PHI1 shifts the peak of the DFG distribution for both protomers further outward, as shown by the 30° larger DFG pseudo dihedral in the holo dimer relative to the mixed dimer (green and blue in Figure 5b right; Supplementary Figures 6,9). In contrast, there is no significant difference in the DFG pseudo dihedral between the LY-mixed and holo dimers. These data suggest that while the binding of the first PHI1 pushes the DFG outward, binding of the second PHI1 has an allosteric effect, shifting the DFG of the opposite protomer further outward.”

      On page 12, the last paragraph of Conclusion, we remove the claim of the priming effect for DFG:

      “The first PHI1 binding in the BRAF<sup>V600E</sup> dimer restricts the motion of the αC helix and DFG, shifting them slightly inward and outward, respectively (Figure 6, bottom right panel). Intriguingly, the first PHI1 binding primes the apo protomer by making the αC more favorable for binding, i.e., shifting the αC inward (Figure 6, bottom right panel). Importantly, upon binding the second PHI1, the αC helix is shifted further inward and the DFG is shifted further outward in both protomers.”

      On page 13, Figure 6:

      we removed the label “slightly outward” for DFG.

      Comment 3: An alternative approach could be using enhanced sampling methods to enhance the diffusion along these coordinates.

      We thank the reviewer for bringing up this point. While that the allostery and cooperativity effects are apparent from our simulation data, we agree that enhanced sampling methods in principle could be used to further converge the conformational sampling; however, these approaches face significant challenges. First, BRAF dimer is weakly associated, with αC helix forming a part of the dimer interface. Enhanced sampling of αC helix would likely result in dimer dissociation. On the other hand, simply using RMSD as a reaction coordinate or progress variable would not necessarily enhance the motion of αC helix or DFG or activation loop, which are all coupled. Second, our extensive simulations of a monomer kinase with metadynamics demonstrated that the kinase conformation becomes distorted when a biasing potential is placed to enhance the motion of DFG. This is likely because the other parts of the protein do not have enough time to relax to accommodate the conformational change. To our knowledge, this aspect has not been discussed in the current metadynamics literature, which focuses on the free energy differences and (local) conformational changes along the reaction coordinate. To clarify these points, we added a discussion.

      Page 6, end of the first paragraph:

      “We note that enhanced sampling methods were not used due to several challenges. First, the BRAF dimer is weakly associated, with αC helix forming a part of the dimer interface (Figure 1a). Enhanced sampling (particularly of αC helix) would likely lead to dimer dissociation. Second, biased sampling methods such as metadynamics may lead to unrealistic conformational states due to the slow relaxation of some parts of the protein to accommodate the conformational change directed by the reaction coordinate. For example, our unpublished metadynamics simulations of a monomer kinase showed that enhancing the DFG conformational change resulted in distortion of the kinase structure.”

      We thank the reviewers again for their valuable comments. We believe our revision has further elevated the quality of the manuscript.

    1. eLife Assessment

      This study presents valuable evidence of sex differences in oxycodone relapse-related behavior in rats and provides insight into associated synaptic plasticity in the paraventricular thalamus to the nucleus accumbens shell (PVT-NAcSh) circuit. The report reveals that females show heightened cue-induced oxycodone seeking compared to males after 14 days – but not 1 day – of abstinence; however, an increase in synaptic strength from the PVT inputs to the NAcSh was observed in both males and females at 14 days of abstinence. Therefore, whereas the behavioral data and much of the electrophysiology data are solid, the link between them is incomplete. Further investigation of the functional role of the PVT-NAcSh pathway in the observed sex differences in oxycodone relapse and examination of input and cell-type specificity of synaptic alterations would greatly strengthen this study.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Alonso-Caraballo et al, is a novel piece of work that examines the impact of oxycodone self-administration on neural plasticity within the paraventricular thalamic (PVT) to nucleus accumbens shell (Shell) pathway - two regions shown to play a key role in cue-induced drug seeking on their own, and whether this plasticity varies based on abstinence period and biological sex.

      Strengths:

      The authors show using a clinically relevant long-access model of opioid self-administration promotes dependence and acute withdrawal in both male and female rats. During subsequent cue-induced relapse tests at 1 or 14 days following the conclusion of self-administration, data show that while both males and females demonstrate drug-seeking behavior at both time points, females show a further elevation in responding on day 14 versus day 1 which is not observed in the males. When accounting for past work showing elevations in drug-seeking in males after 30 days, these data indicate that craving-induced relapse for opioids may develop faster and may be more pronounced in females compared to males.

      These behavioral findings were paralleled by the use of ex vivo acute slice electrophysiology and circuit-specific ex vivo optogenetics to examine the impact of oxycodone self-administration on synaptic strength within the paraventricular thalamus (PVT) to nucleus accumbens shell (NAcSh) pathway(s). Data support a time-dependent but sex-independent strengthening of glutamatergic signaling at PVT-to-NAcSh medium spiny neurons (MSNs) that is only present following a relapse test at 14 days post abstinence in males versus females, providing the first evidence that opioid self-administration and/or cue-induced drug-seeking augments this pathway. Using an extensive set of physiological measures, the authors show that this increased synaptic strength reflects an upregulation of presynaptic release probability. Further, this upregulation of excitatory signaling aligned temporally with an increase in MSN excitability, as assessed by increases in action potential firing frequency. Finally, the authors provide the first evidence that similar to other inputs to the NAcSh, PVT projections innervate both MSN as well as local interneurons, promoting a GABA-A-specific feedforward inhibitory circuit. Interestingly, unlike direct excitatory inputs to MSNs, no changes were observed ostensibly within this feedforward circuit, highlighting a selective enhancement of excitatory drive and output of MSNs with protracted abstinence.

      Overall, these data highlight a potential role for heightened synaptic strength within the PVT-NAcSh pathway in cue-induced relapse behavior during protracted abstinence and identify a potential therapeutic target during abstinence to reduce relapse risk in abstaining individuals.

      Weaknesses:

      Overall, the experimental approach and data provided appear rigorous and support their overall conclusions and achieve their goal of understanding how opioid self-administration impacts synaptic strength within the PVT-NAcSh pathway. Although not undermining these data, there are a few potential weaknesses that reduce the impact of the work. For example, the inability to directly assess whether cue-induced drug-seeking is in fact augmented compared to daily intake during self-administration in the maintenance face only permits the authors to denote that reexposure to cues and the context is sufficient to promote active lever pressing without demonstrating whether seeking behavior is in fact elevated further during a cue test. This is notably understandable as drug available sessions were 6-hours versus a 1-hour relapse test. Importantly, it is clearly demonstrated that drug seeking is higher on average in female mice after 14 days versus 1 day.

      With regard to the interpretation of electrophysiology findings, the lack of inclusion of an abstinence-only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test. While much literature has shown that drug-induced adaptations in the NAc require a post-drug period for plasticity to measurably emerge, studies have also shown that re-exposure to heroin-associated cues following abstinence seemingly "reverses" increases in cell excitability in prelimbic-NAc pyramidal neurons (Kokane et al., 2023) and that depotentiation of morphine-induced increases in synaptic strength in the NAc shell can be depotentiated by drug re-exposure - an effect also observed with cocaine re-exposure (Madayag et al., 2019). Notably, the lack of effect at 14 but not 1 day supports the likelihood that the relapse test does not in fact influence the plasticity within the PVT-NAcSh circuit.

      While the lack of effect on AMPAR:NMDAR ratio and rectification indices do support the notion that enhanced EPSC amplitudes in input-output curves do not reflect a change in AMPAR subunit expression (i.e., increased GluA2-lacking receptors that exhibit inward rectification at depolarized potential) nor a change in postsynaptic sensitivity to glutamate, without direct assessment of AMPAR-specific and NMDAR-specific input-output curves, it doesn't definitively exclude the possibility that both AMPA and NMDA receptor currents are being upregulated, thus negating an observable change in postsynaptic strength.

      Overall, these findings provide novel insight into how the PVT-NAcSh pathway is altered by opioid self-administration and whether this is unique based on abstinence period and sex. Importantly, these were the primary objectives stated by the author. Data highlight a potential role for the observed adaptations in relapse behavior and identify a potential therapeutic target during abstinence to reduce relapse risk in abstaining individuals. However, it should be noted that no causal link is demonstrated without experiments to reduce/prevent relapse.

    3. Reviewer #2 (Public review):

      This is an interesting paper from Alonso-Caraballo and colleagues that examines the influence of opioid use, abstinence, and sex on paraventricular thalamus (PVT) to nucleus accumbens shell (NAcSh) medium spiny neurons circuit physiology. The authors first find that prolonged abstinence from extended access to oxycodone self-administration leads to profoundly increased cue-induced reinstatement in females. Next, they found that prolonged abstinence increased PVT-NAcSh MSN synaptic strength, an effect that was likely due to presynaptic adaptation (paired-pulse ratio was decreased in both sexes).

      While this paper is certainly interesting, and well-written, and the experiments seem to be well performed, the behavioral and physiological effects observed are somewhat divorced. Specifically, what accounts for the heightened relapse in females? Since no opioid-related sex differences were observed in PVT-NAcSh neurophysiology, it is unclear how the behavioral and neurophysiological data fit together. Furthermore, the lack of functional manipulation of PVT-NAcSh circuitry leaves one to wonder if this circuit is even important for the behavior that the authors are measuring. I would be more positive about this study if the authors were able to resolve either of the two issues noted above.

      I also noted more moderate weaknesses that the authors should consider:

      (1) There are insufficient animals in some cases. For example, in Figure 4, the Male Saline 14-day abstinence group (n = 3 rats) has less than half of the excitability as compared to the Male Saline 1-day abstinence group (n = 7 rats). This is likely due to variance between animals and, possibly, oversampling. Thus, more rats need to be added to the 14-day abstinence group. Additionally, the range of n neurons/rat should be reported for each experiment to ensure readers that oversampling from single animals is not occurring.

      (2) The IPSC data, for example in Figure 4, is one of the more novel experiments in the manuscript. However, it is quite challenging to see the difference between males and females, saline and oxycodone, at low stimulation intensities within the graph. Authors should expand this so that reviewers/readers can see those data, especially considering other work suggesting that PVT synaptic input onto select NAc interneurons is disrupted following opioid self-administration. Additional comment: It's also interesting that the IPSC amplitude seems to be maximal at ~2mW of light, whereas ~11 mW is required to evoke maximal EPSC amplitude. It would be interesting to know the authors' thoughts on why this may be.

      (3) There is an inadequate description of what has been done to date on the PVT-NAc projection regarding opioid withdrawal, seeking, disinhibition, and the effects on synaptic physiology therein. For example, a critical paper, Keyes et al., 2020 Neuron, is not cited. Additionally, Paniccia et al., 2024 Neuron is inaccurately cited and insufficiently described. Both manuscripts should be described in some detail within the introduction, and the findings should be accurately contextualized within the broader circuit within the discussion.

      (4) Related to the above, the authors should provide a more comprehensive description of how PVT synapses onto cell-type specific neurons in the NAc which expands beyond MSNs, especially considering that PVT has been shown to influence drug/opioid seeking through the innervation of NAc neurons that are not MSNs. For example, see PMIDs 33947849, 36369508, 28973852, 38141605.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Alonso-Caraballo et al. investigate sex-specific differences in oxycodone self-administration, withdrawal, and relapse behaviors in rats, as well as associated synaptic plasticity in the paraventricular thalamus to nucleus accumbens shell (PVT-NAcSh) circuit. The authors employ a combination of behavioral paradigms and ex vivo electrophysiology to examine how acute (1-day) and prolonged (14-day) abstinence from oxycodone self-administration affect cue-induced drug-seeking and synaptic transmission in male and female rats. Their findings reveal that while both sexes show similar oxycodone self-administration and acute withdrawal symptoms, females exhibit enhanced cue-induced relapse after prolonged abstinence. Furthermore, they show that prolonged abstinence is associated with increased synaptic strength in the PVT-NAcSh circuit (reduced paired-pulse ratio) and enhanced intrinsic excitability of NAcSh medium spiny neurons in both sexes. This study provides important insights into the sex-specific neural adaptations that may underlie vulnerability to opioid relapse and highlights the PVT-NAcSh circuit as a potential target for therapeutic interventions. However, although this study is well designed, no sex differences were observed in the synaptic activity within this pathway that could explain increased oxycodone seeking in females versus male rats. Additional experiments could strengthen the results and help clarify synaptic mechanisms underpinning behavioral sex differences.

      Strengths:

      The study exhibits several strengths. It provides a comprehensive behavioral analysis of oxycodone self-administration, withdrawal, and cue-induced relapse in both male and female rats at different time points (acute vs. protracted withdrawal) offering valuable insights into sex-specific differences (i.e., increased oxycodone seeking in females over time but not males). The authors examine synaptic plasticity in the PVT-NAcSh circuit at different abstinence time points, integrating behavioral and electrophysiological data to link circuit adaptations with relapse behaviors, although no sex differences in the electrophysiological parameters examined were evident. The investigation of intrinsic excitability changes in NAcSh medium spiny neurons further enhances the study's depth. Overall, the well-designed experiments provide important insights into the neural adaptations that may underlie vulnerability to opioid relapse, highlighting the PVT-NAcSh circuit as a potential target for therapeutic interventions in opioid use disorder.

      Weaknesses:

      Despite its strengths, the study has several notable limitations. A key weakness is the lack of observed sex differences in synaptic activity within the PVT-NAcSh pathway that could explain the behavioral results. The authors' failure to differentiate between D1 and D2 medium spiny neurons (MSNs) in the nucleus accumbens represents a missed opportunity to identify potential sex-specific differences at the cellular level, although they do discuss reasons for this omission. The only significant synaptic change observed - reduced paired-pulse ratio indicating increased synaptic strength - occurs in both males and females, failing to explain the sex-specific behavioral differences. Furthermore, the investigation of intrinsic excitability in NAc MSNs adds complexity to data interpretation, as the authors neither differentiate between D1 and D2 MSNs nor confirm that recorded neurons receive direct inputs from the PVT. This assumption potentially confounds the results. Overall, while the study provides valuable insights, additional experiments targeting specific cell populations and more detailed synaptic analyses are needed to elucidate the mechanisms underlying the observed behavioral sex differences in opioid relapse vulnerability.

    1. eLife Assessment

      Although others have proposed that OHC electromotility subserves cochlear amplification by acting as a "fluid pump", and evidence for this has been found using electrical stimulation of excised cochleae, this important study substantially advances our understanding of cochlear homeostasis. This is the first report to test the pumping effect in vivo and consider its implications for cochlear homeostasis and drug delivery. The manuscript provides convincing evidence for OHC-based fluid flow within the cochlea.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the "OHC-fluid-pump" hypothesis by assaying the rates of kainic acid dispersal both in quiet and in cochleae stimulated by sounds of different levels and spectral content. The main result is that sound (and thus, presumably, OHC contractions and expansions) result in faster transport along the duct. OHC involvement is corroborated using salicylate, which yielded results similar to silence. Especially interesting is the fact that some stimuli (e.g., tones) seem to provide better/faster pumping than others (e.g., noise), ostensibly due to the phase profile of the resulting cochlear traveling-wave response.

      Strengths:

      The experiments appear well controlled and the results are novel and interesting. Some elegant cochlear modeling that includes coupling between the organ of Corti and the surrounding fluid as well as advective flow supports the proposed mechanism.

      The current limitations and future directions of the study, including possible experimental tests, extensions of the modeling work, and practical applications to drug delivery, are thoughtfully discussed.

    3. Reviewer #2 (Public review):

      Although recent cochlear micromechanical measurements in living animals have shown that outer hair cells drive broadband vibration of the reticular lamina, the role of this vibration in cochlear fluid circulation remains unclear. The authors hypothesized that motile outer hair cells facilitate cochlear fluid circulation. To test this, they investigated the effects of acoustic stimuli and salicylate on kainic acid-induced changes in the cochlear nucleus activities. The results reveal that low-frequency tones accelerate the effect of kainic acid, while salicylate reduces the impact of acoustic stimuli, indicating that outer hair cells actively drive cochlear fluid circulation.

      The major strengths of this study lie in its high significance and the synergistic use of both electrophysiological recording and computational modeling. Recent in vivo observations of the broadband reticular lamina vibration challenge the traditional view of frequency-specific cochlear amplification. Furthermore, there is currently no effective noninvasive method to deliver the drugs or genes to the cochlea. This study addresses these important questions by observing outer hair cells' roles in the cochlear transport of kainic acid. The author utilized a well-established electrophysiological method to produce valuable new data and a custom-developed computational model to enhanced the interpretation of their experimental results.

      The authors successfully validated their hypothesis, showing through the experimental and modeling results that active outer hair cells enhance cochlear fluid circulation in the living cochlea.

      These findings have significant implications for advancing our understanding of cochlear amplification and offer promising clinical applications for treating hearing loss by accelerating cochlear drug delivery.

    4. Reviewer #3 (Public review):

      Summary:

      This study reveals that sound exposure enhances drug delivery to the cochlea through the non-selective action of outer hair cells. The efficiency of sound-facilitated drug delivery is reduced when outer hair cell motility is inhibited. Additionally, low-frequency tones were found to be more effective than broadband noise for targeting substances to the cochlear apex. Computational model simulations support these findings.

      Strengths:

      The study provides compelling evidence that the broad action of outer hair cells is crucial for cochlear fluid circulation, offering a novel perspective on their function beyond frequency-selective amplification. Furthermore, these results could offer potential strategies for targeting and optimizing drug delivery throughout the cochlear spiral.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors test the "OHC-fluid-pump" hypothesis by assaying the rates of kainic acid dispersal both in quiet and in cochleae stimulated by sounds of different levels and spectral content. The main result is that sound (and thus, presumably, OHC contractions and expansions) result in faster transport along the duct. OHC involvement is corroborated using salicylate, which yielded results similar to silence. Especially interesting is the fact that some stimuli (e.g., tones) seem to provide better/faster pumping than others (e.g., noise), ostensibly due to the phase profile of the resulting cochlear traveling-wave response.

      Strengths:

      The experiments appear well controlled and the results are novel and interesting. Some elegant cochlear modeling that includes coupling between the organ of Corti and the surrounding fluid as well as advective flow supports the proposed mechanism.

      The current limitations and future directions of the study, including possible experimental tests, extensions of the modeling work, and practical applications to drug delivery, are thoughtfully discussed.

      Weaknesses:

      Although the authors provide compelling evidence that OHC motility can usefully pump fluid, their claim (last sentence of the Abstract) that wideband OHC motility (i.e., motility in the "tail" region of the traveling wave) evolved for the purposes of circulating fluid---rather then emerging, say, as a happy by-product of OHC motility that evolved for other reasons---seems too strong.

      We adjusted our tone to be less assertive.

      Our measurements and simulations coherently suggest that active outer hair cells in the tail region of cochlear traveling waves drive cochlear fluid circulation.

      Reviewer #2 (Public review):

      Although recent cochlear micromechanical measurements in living animals have shown that outer hair cells drive broadband vibration of the reticular lamina, the role of this vibration in cochlear fluid circulation remains unknown. The authors hypothesized that motile outer hair cells may facilitate cochlear fluid circulation. To test this hypothesis, they investigated the effects of acoustic stimuli and salicylate, an outer hair cell motility blocker, on kainic acid-induced changes in the cochlear nucleus activities. The results demonstrated that acoustic stimuli reduced the latency of the kainic acid effect, with low-frequency tones being more effective than broadband noise. Salicylate reduced the effect of acoustic stimuli on kainic acid-induced changes. The authors also developed a computational model to provide a physical framework for interpreting experimental results. Their combined experimental and simulated results indicate that broadband outer hair cell action serves to drive cochlear fluid circulation.

      The major strengths of this study lie in its high significance and the synergistic use of electrophysiological recording of the cochlear nucleus responses alongside computational modeling. Cochlear outer hair cells have long been believed to be responsible for the exceptional sensitivity, sharp tuning, and huge dynamic range of mammalian hearing. However, recent observations of the broadband reticular lamina vibration contradict widely accepted view of frequency-specific cochlear amplification. Furthermore, there is currently no effective noninvasive method to deliver the drugs or genes to the cochlea, a crucial need for treating sensorineural hearing loss, one of the most common auditory disorders. This study addresses these important questions by observing outer hair cells' roles in the cochlear transport of kainic acid. The well-established electrophysiological method used to record cochlear nucleus responses produced valuable new data, and the custom-developed developed computational model greatly enhanced the interpretation of the experimental results.

      The authors successfully tested their hypothesis, with both the experimental and modeling results supporting the conclusion that active outer hair cells can enhance cochlear fluid circulation in the living cochlea.

      The findings from this study can potentially be applied for treating sensorineural hearing loss and advance our understanding of how outer hair cells contribute to cochlear amplification and normal hearing.

      Reviewer #3 (Public review):

      Summary:

      This study reveals that sound exposure enhances drug delivery to the cochlea through the nonselective action of outer hair cells. The efficiency of sound-facilitated drug delivery is reduced when outer hair cell motility is inhibited. Additionally, low-frequency tones were found to be more effective than broadband noise for targeting substances to the cochlear apex. Computational model simulations support these findings.

      Strengths:

      The study provides compelling evidence that the broad action of outer hair cells is crucial for cochlear fluid circulation, offering a novel perspective on their function beyond frequency-selective amplification. Furthermore, these results could offer potential strategies for targeting and optimizing drug delivery throughout the cochlear spiral.

      Weaknesses:

      The primary weakness of this paper lies in the surgical procedure used for drug administration through the round window. Opening the cochlea can alter intracochlear pressure and disrupt the traveling wave from sound, a key factor influencing outer hair cell activity. However, the authors do not provide sufficient details on how they managed this issue during surgery. Additionally, the introduction section needs further development to better explain the background and emphasize the significance of the work.

      Comments on revisions:

      Thank you for addressing the comments and concerns. The author has responded to all points thoroughly and clarified them well. However, please include the key points from the responses to the comments (Introduction ((3), (5)) and Results ((5)) into the manuscript. While the explanations in the response letter are reasonable, the current descriptions in the manuscript may limit the reader's understanding. Expanding on these points in the Introduction, Results, or Discussion sections would enhance clarity and comprehensiveness.

      Introduction (3): As inner-ear fluid homeostasis is maintained locally, longitudinal electro-chemical gradients, including the endocochlear potential, may vary along the cochlear length (Schulte and Schmiedt 1992; Sadanaga and Morimitsu 1995; Hirose and Liberman 2003).

      Introduction (5): We do not want to distract the readers from the primary message by discussing different drug delivery methods into the inner ear. This paper is regarding active outer hair cells’ new role as the title suggests. An extensive discussion of drug delivery can confuse the theme of this work.

      Results (5): High frequencies were not tested because they would not affect drug delivery to the apex of the cochlea (i.e., the traveling waves stop near the CF location.)

    1. eLife Assessment

      This valuable manuscript presents a thorough analysis of the evolution of Major Histocompatibility Complex gene families across Primates. A key strength of this analysis is the use of state-of-the-art phylogenetic methods to estimate rates of gene gain and loss, but estimates of gene loss may suffer from the issue of genes entirely or partially missing from genome assemblies represented in the public databases used, given the notorious difficulty to properly assemble MHC gene genomic regions. Overall the evidence provided is still convincing, but the manuscript may benefit from discussing approaches that can address the issue of entirely or partially missing genes, in particular how the use of long reads to completely re-assemble complex loci might improve the assessment of the complex evolutionary processes at play in MHC gene families.

    2. Reviewer #1 (Public review):

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust insofar as the data might be complete enough to draw such conclusions.

      Weaknesses:

      One concern about the present work is that it relies on public databases to draw inferences about gene loss, which is potentially risky if the publicly available sequence data are incomplete. To say, for example, that a particular MHC gene copy is absent in a taxon (e.g., Class I locus F absent in Guenons according to Figure 1), we need to trust that its absence from the available databases is an accurate reflection of its absence in the genome of the actual organisms. This may be a safe assumption, but it rests on the completeness of genome assembly (and gene annotations?) or people uploading relevant data. This reviewer would have been far more comfortable had the authors engaged in some active spot-checking, doing the lab work to try to confirm absences at least for some loci and some species. Without this, a reader is left to wonder whether gene loss is simply reflecting imperfect databases, which then undercuts confidence in estimates of rates of gene loss.

      Some context is useful for comparing rates of gene turnover in MHC, to other loci. Changing gene copy numbers, duplications, and loss of duplicates, are common it seems across many loci and many organisms; is MHC exceptional in this regard, or merely behaving like any moderately large gene family? I would very much have liked to see comparable analyses done for other gene families (immune, like TLRs, or non-immune), and quantitative comparisons of evolutionary rates between MHC versus other genes. Does MHC gene composition evolve any faster than a random gene family? At present readers may be tempted to infer this, but evidence is not provided.

      While on the topic of making comparisons, the authors make a few statements about relative rates. For instance, lines 447-8 compare gene topology of classical versus non-classical genes; and line 450 states that classical genes experience more turnover. But there are no quantitative values given to these rates to provide numerical comparisons, nor confidence intervals provided (these are needed, given that they are estimates), nor formal statistical comparisons to confirm our confidence that rates differ between types of genes.

      More broadly, the paper uses sophisticated phylogenetic methods, but without taking advantage of macroevolutionary comparative methods that allow model-based estimation of macroevolutionary rates. I found the lack of quantitative measurements of rates of gene gain/loss to be a weakness of the present version of the paper, and something that should be readily remedied. When claiming that MHC Class I genes "turn over rapidly" (line 476) - what does rapidly mean? How rapidly? How does that compare to rates of genetic turnover at other families? Quantitative statements should be supported by quantitative estimates (and their confidence intervals).

      The authors refer to 'shared function of the MHC across species' (e.g. line 22); while this is likely true, they are not here presenting any functional data to confirm this, nor can they rule out neofunctionalization or subfunctionalization of gene duplicates. There is evidence in other vertebrates (e.g., cod) of MHC evolving appreciably altered functions, so one may not safely assume the function of a locus is static over long macroevolutionary periods, although that would be a plausible assumption at first glance.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive understanding of the evolutionary history of the Major Histocompatibility Complex (MHC) gene family across primate species. Specifically, they sought to:

      (1) Analyze the evolutionary patterns of MHC genes and pseudogenes across the entire primate order, spanning 60 million years of evolution.

      (2) Build gene and allele trees to compare the evolutionary rates of MHC Class I and Class II genes, with a focus on identifying which genes have evolved rapidly and which have remained stable.

      (3) Investigate the role of often-overlooked pseudogenes in reconstructing evolutionary events, especially within the Class I region.

      (4) Highlight how different primate species use varied MHC genes, haplotypes, and genetic variation to mount successful immune responses, despite the shared function of the MHC across species.

      (5) Fill gaps in the current understanding of MHC evolution by taking a broader, multi-species perspective using (a) phylogenomic analytical computing methods such as Beast2, Geneconv, BLAST, and the much larger computing capacities that have been developed and made available to researchers over the past few decades, (b) literature review for gene content and arrangement, and genomic rearrangements via haplotype comparisons.

      (6) The authors overall conclusions based on their analyses and results are that 'different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response'.

      Strengths:

      Essentially, much of the information presented in this paper is already well-known in the MHC field of genomic and genetic research, with few new conclusions and with insufficient respect to past studies. Nevertheless, while MHC evolution is a well-studied area, this paper potentially adds some originality through its comprehensive, cross-species evolutionary analysis of primates, focus on pseudogenes and the modern, large-scale methods employed. Its originality lies in its broad evolutionary scope of the primate order among mammals with solid methodological and phylogenetic analyses.

      The main strengths of this study are the use of large publicly available databases for primate MHC sequences, the intensive computing involved, the phylogenetic tool Beast2 to create multigene Bayesian phylogenetic trees using sequences from all genes and species, separated into Class I and Class II groups to provide a backbone of broad relationships to investigate subtrees, and the presentation of various subtrees as species and gene trees in an attempt to elucidate the unique gene duplications within the different species. The study provides some additional insights with summaries of MHC reference genomes and haplotypes in the context of a literature review to identify the gene content and haplotypes known to be present in different primate species. The phylogenetic overlays or ideograms (Figures 6 and 7) in part show the complexity of the evolution and organisation of the primate MHC genes via the orthologous and paralogous gene and species pathways progressively from the poorly-studied NWM, across a few moderately studied ape species, to the better-studied human MHC genes and haplotypes.

      Weaknesses:

      The title 'The Primate Major Histocompatibility Complex: An Illustrative Example of Gene Family Evolution' suggests that the paper will explore how the Major Histocompatibility Complex (MHC) in primates serves as a model for understanding gene family evolution. The term 'Illustrative Example' in the title would be appropriate if the paper aimed to use the primate Major Histocompatibility Complex (MHC) as a clear and representative case to demonstrate broader principles of gene family evolution. That is, the MHC gene family is not just one instance of gene family evolution but serves as a well-studied, insightful example that can highlight key mechanisms and concepts applicable to other gene families. However, this is not the case, this paper only covers specific details of primate MHC evolution without drawing broader lessons to any other gene families. So, the term 'Illustrative Example' is too broad or generalizing. In this case, a term like 'Case Study' or simply 'Example' would be more suitable. Perhaps, 'An Example of Gene Family Diversity' would be more precise. Also, an explanation or 'reminder' is suggested that this study is not about the origins of the MHC genes from the earliest jawed vertebrates per se (~600 mya), but it is an extension within a subspecies set that has emerged relatively late (~60 mya) in the evolutionary divergent pathways of the MHC genes, systems, and various vertebrate species.

      Phylogenomics. Particular weaknesses in this study are the limitations and problems associated with providing phylogenetic gene and species trees to try and solve the complex issue of the molecular mechanisms involved with imperfect gene duplications, losses, and rearrangements in a complex genomic region such as the MHC that is involved in various effects on the response and regulation of the immune system. A particular deficiency is drawing conclusions based on a single exon of the genes. Different exons present different trees. Which are the more reliable? Why were introns not included in the analyses? The authors attempt to overcome these limitations by including genomic haplotype analysis, duplication models, and the supporting or contradictory information available in previous publications. They succeed in part with this multidiscipline approach, but much is missed because of biased literature selection. The authors should include a paragraph about the benefits and limitations of the software that they have chosen for their analysis, and perhaps suggest some alternative tools that they might have tried comparatively. How were problems with Bayesian phylogeny such as computational intensity, choosing probabilities, choosing particular exons for analysis, assumptions of evolutionary models, rates of evolution, systemic bias, and absence of structural and functional information addressed and controlled for in this study?

      Gene families as haplotypes. In the Introduction, the MHC is referred to as a 'gene family', and in paragraph 2, it is described as being united by the 'MHC fold', despite exhibiting 'very diverse functions'. However, the MHC region is more accurately described as a multigene region containing diverse, haplotype-specific Conserved Polymorphic Sequences, many of which are likely to be regulatory rather than protein-coding. These regulatory elements are essential for controlling the expression of multiple MHC-related products, such as TNF and complement proteins, a relationship demonstrated over 30 years ago. Non-MHC fold loci such as TNF, complement, POU5F1, lncRNA, TRIM genes, LTA, LTB, NFkBIL1, etc, are present across all MHC haplotypes and play significant roles in regulation. Evolutionary selection must act on genotypes, considering both paternal and maternal haplotypes, rather than on individual genes alone. While it is valuable to compile databases for public use, their utility is diminished if they perpetuate outdated theories like the 'birth-and-death model'. The inclusion of prior information or assumptions used in a statistical or computational model, typically in Bayesian analysis, is commendable, but they should be based on genotypic data rather than older models. A more robust approach would consider the imperfect duplication of segments, the history of their conservation, and the functional differences in inheritance patterns. Additionally, the MHC should be examined as a genomic region, with ancestral haplotypes and sequence changes or rearrangements serving as key indicators of human evolution after the 'Out of Africa' migration, and with disease susceptibility providing a measurable outcome. There are more than 7000 different HLA-B and -C alleles at each locus, which suggests that there are many thousands of human HLA haplotypes to study. In this regard, the studies by Dawkins et al (1999 Immunol Rev 167,275), Shiina et al. (2006 Genetics 173,1555) on human MHC gene diversity and disease hitchhiking (haplotypes), and Sznarkowska et al. (2020 Cancers 12,1155) on the complex regulatory networks governing MHC expression, both in terms of immune transcription factor binding sites and regulatory non-coding RNAs, should be examined in greater detail, particularly in the context of MHC gene allelic diversity and locus organization in humans and other primates.

      Diversifying and/or concerted evolution. Both this and past studies highlight diversifying selection or balancing selection model is the dominant force in MHC evolution. This is primarily because the extreme polymorphism observed in MHC genes is advantageous for populations in terms of pathogen defence. Diversification increases the range of peptides that can be presented to T cells, enhancing the immune response. The peptide-binding regions of MHC genes are highly variable, and this variability is maintained through selection for immune function, especially in the face of rapidly evolving pathogens. In contrast, concerted evolution, which typically involves the homogenization of gene duplicates through processes like gene conversion or unequal crossing-over, seems to play a minimal role in MHC evolution. Although gene duplication events have occurred in the MHC region leading to the expansion of gene families, the resulting paralogs often undergo divergent evolution rather than being kept similar or homozygous by concerted evolution. Therefore, unlike gene families such as ribosomal RNA genes or histone genes, where concerted evolution leads to highly similar copies, MHC genes display much higher levels of allelic and functional diversification. Each MHC gene copy tends to evolve independently after duplication, acquiring unique polymorphisms that enhance the repertoire of antigen presentation, rather than undergoing homogenization through gene conversion. Also, in some populations with high polymorphism or genetic drift, allele frequencies may become similar over time without the influence of gene conversion. This similarity can be mistaken for gene conversion when it is simply due to neutral evolution or drift, particularly in small populations or bottlenecked species. Moreover, gene conversion might contribute to greater diversity by creating hybrids or mosaics between different MHC genes. In this regard, can the authors indicate what percentage of the gene numbers in their study have been homogenised by gene conversion compared to those that have been diversified by gene conversion?

      Duplication models. The phylogenetic overlays or ideograms (Figures 6 and 7) show considerable imperfect multigene duplications, losses, and rearrangements, but the paper's Discussion provides no in-depth consideration of the various multigenic models or mechanisms that can be used to explain the occurrence of such events. How do their duplication models compare to those proposed by others? For example, their text simply says on line 292, 'the proposed series of events is not always consistent with phylogenetic data'. How, why, when? Duplication models for the generation and extension of the human MHC class I genes as duplicons (extended gene or segmental genomic structures) by parsimonious imperfect tandem duplications with deletions and rearrangements in the alpha, beta, and kappa blocks were already formulated in the late 1990s and extended to the rhesus macaque in 2004 based on genomic haplotypic sequences. These studies were based on genomic sequences (genes, pseudogenes, retroelements), dot plot matrix comparisons, and phylogenetic analyses of gene and retroelement sequences using computer programs. It already was noted or proposed in these earlier 1999 studies that (1) the ancestor of HLA-P(90)/-T(16)/W(80) represented an old lineage separate from the other HLA class I genes in the alpha block, (2) HLA-U(21) is a duplicated fragment of HLA-A, (3) HLA-F and HLA-V(75) are among the earliest (progenitor) genes or outgroups within the alpha block, (4) distinct Alu and L1 retroelement sequences adjoining HLA-L(30), and HLA-N genomic segments (duplicons) in the kappa block are closely related to those in the HLA-B and HLA-C in the beta block; suggesting an inverted duplication and transposition of the HLA genes and retroelements between the beta and kappa regions. None of these prior human studies were referenced by Fortier and Pritchard in their paper. How does their human MHC class I gene duplication model (Fig. 6) such as gene duplication numbers and turnovers differ from those previously proposed and described by Kulski et al (1997 JME 45,599), (1999 JME 49,84), (2000 JME 50,510), Dawkins et al (1999 Immunol Rev 167,275), and Gaudieri et al (1999 GR 9,541)? Is this a case of reinventing the wheel?

      Results. The results are presented as new findings, whereas most if not all of the results' significance and importance already have been discussed in various other publications. Therefore, the authors might do better to combine the results and discussion into a single section with appropriate citations to previously published findings presented among their results for comparison. Do the trees and subsets differ from previous publications, albeit that they might have fewer comparative examples and samples than the present preprint? Alternatively, the results and discussion could be combined and presented as a review of the field, which would make more sense and be more honest than the current format of essentially rehashing old data.

      Minor corrections:

      (1) Abstract, line 19: 'modern methods'. Too general. What modern methods?

      (2) Abstract, line 25: 'look into [primate] MHC evolution.' The analysis is on the primate MHC genes, not on the entire vertebrate MHC evolution with a gene collection from sharks to humans. The non-primate MHC genes are often differently organised and structurally evolved in comparison to primate MHC.

      (3) Introduction, line 113. 'In a companion paper (Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      (4) Figures 1 and 2. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. 'Asterisks "within symbols" indicate new information.

      (5) Figures. A variety of colours have been applied for visualisation. However, some coloured texts are so light in colour that they are difficult to read against a white background. Could darker colours or black be used for all or most texts?

      (6) Results, line 135. '(Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      (7) Results, lines 152 to 153, 164, 165, etc. 'Points with an asterisk'. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. A point is a small dot such as those used in data points for plotting graphs .... The figures are so small that the asterisks in the circles, squares, triangles, etc, look like points (dots) and the points/asterisks terminology that is used is very confusing visually.

      (8) Line 178 (BEA, 2024) is not listed alphabetically in the References.

      (9) Lines 188-190. 'NWM MHC-G does not group with ape/OWM MHC-G, instead falling outside of the clade containing ape/OWM MHC-A, -G, -J and -K.' This is not surprising given that MHC-A, -G, -J, and -K are paralogs of each other and that some of them, especially in NWM have diverged over time from the paralogs and/or orthologs and might be closer to one paralog than another and not be an actual ortholog of OWM, apes or humans.

      (10) Line 249. Gene conversion: This is recombination between two different genes where a portion of the genes are exchanged with one another so that different portions of the gene can group within one or other of the two gene clades. Alternatively, the gene has been annotated incorrectly if the gene does not group within either of the two alternative clades. Another possibility is that one or two nucleotide mutations have occurred without a recombination resulting in a mistaken interpretation or conclusion of a recombination event. What measures are taken to avoid false-positive conclusions? How many MHC gene conversion (recombination) events have occurred according to the authors' estimates? What measures are taken to avoid false-positive conclusions?

      (11) Lines 284-286. 'The Class I MHC region is further divided into three polymorphic blocks-alpha, beta, and kappa blocks-that each contains MHC genes but are separated by well-conserved non-MHC genes.' The MHC class I region was first designated into conserved polymorphic duplication blocks, alpha and beta by Dawkins et al (1999 Immunol Rev 167,275), and kappa by Kulski et al (2002 Immunol Rev 190,95), and should be acknowledged (cited) accordingly.

      (12) Lines 285-286. 'The majority of the Class I genes are located in the alpha-block, which in humans includes 12 MHC genes and pseudogenes.' This is not strictly correct for many other species, because the majority of class I genes might be in the beta block of new and old-world monkeys, and the authors haven't provided respective counts of duplication numbers to show otherwise. The alpha block in some non-primate mammalian species such as pigs, rats, and mice has no MHC class I genes or only a few. Most MHC class I genes in non-primate mammalian species are found in other regions. For example, see Ando et al (2005 Immunogenetics 57,864) for the pig alpha, beta, and kappa regions in the MHC class I region. There are no pig MHC genes in the alpha block.

      (13) Line 297 to 299. 'The alpha-block also contains a large number of repetitive elements and gene fragments belonging to other gene families, and their specific repeating pattern in humans led to the conclusion that the region was formed by successive block duplications (Shiina et al., 1999).' There are different models for successive block duplications in the alpha block and some are more parsimonious based on imperfect multigenic segmental duplications (Kulski et al 1999, 2000) than others (Shiina et al., 1999). In this regard, Kulski et al (1999, 2000) also used duplicated repetitive elements neighbouring MHC genes to support their phylogenetic analyses and multigenic segmental duplication models. For comparison, can the authors indicate how many duplications and deletions they have in their models for each species?

      (14) Lines 315-315. 'Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment.' This sentence should be deleted. Other researchers had already inferred that MHC-U is actually an MHC-A-related gene fragment more than 25 years ago (Kulski et al 1999, 2000) when the MHC-U was originally named MHC-21.

      (15) Lines 361-362. 'Notably, our work has revealed that MHC-V is an old fragment.' This is not a new finding or hypothesis. Previous phylogenetic analysis and gene duplication modelling had already inferred HLA-V (formerly HLA-75) to be an old fragment (Kulski et al 1999, 2000).

      (16) Line 431-433. 'the Class II genes have been largely stable across the mammals, although we do see some lineage-specific expansions and contractions (Figure 2 and Figure 2-gure Supplement 2).' Please provide one or two references to support this statement. Is 'gure' a typo?

      (17) Line 437. 'We discovered far more "specific" events in Class I, while "broad-scale" events were predominant in Class II.' Please define the difference between 'specific' and 'broad-scale'.<br /> 450-451. 'This shows that classical genes experience more turnover and are more often affected by long-term balancing selection or convergent evolution.' Is balancing selection a form of divergent evolution that is different from convergent evolution? Please explain in more detail how and why balancing selection or convergent evolution affects classical and nonclassical genes differently.

      References. Some references in the supplementary materials such as Alvarez (1997), Daza-Vamenta (2004), Rojo (2005), Aarnink (2014), Kulski (2022), and others are missing from the Reference list. Please check that all the references in the text and the supplementary materials are listed correctly and alphabetically.

    4. Reviewer #3 (Public review):

      Summary:

      The article provides the most comprehensive overview of primate MHC class I and class II genes to date, combining published data with an exploration of the available genome assemblies in a coherent phylogenetic framework and formulating new hypotheses about the evolution of the primate MHC genomic region.

      Strengths:

      I think this is a solid piece of work that will be the reference for years to come, at least until population-scale haplotype-resolved whole-genome resequencing of any mammalian species becomes standard. The work is timely because there is an obvious need to move beyond short amplicon-based polymorphism surveys and classical comparative genomic studies. The paper is data-rich and the approach taken by the authors, i.e. an integrative phylogeny of all MHC genes within a given class across species and the inclusion of often ignored pseudogenes, makes a lot of sense. The focus on primates is a good idea because of the wealth of genomic and, in some cases, functional data, and the relatively densely populated phylogenetic tree facilitates the reconstruction of rapid evolutionary events, providing insights into the mechanisms of MHC evolution. Appendices 1-2 may seem unusual at first glance, but I found them helpful in distilling the information that the authors consider essential, thus reducing the need for the reader to wade through a vast amount of literature. Appendix 3 is an extremely valuable companion in navigating the maze of primate MHC genes and associated terminology.

      Weaknesses:

      I have not identified major weaknesses and my comments are mostly requests for clarification and justification of some methodological choices.

    5. Author response:

      We thank the three anonymous reviewers who took the time to read and evaluate our work. We look forward to submitting a revised version of  the manuscript that addresses their comments. 

      We agree with the reviewers that missing genes and incomplete genome assemblies can be challenges when trying to make interspecies comparisons in a complex and repetitive region like the MHC. Our revised manuscript will include more discussion of this topic, and we look forward to future work on this region that considers the next generation of complete telomere-to-telomere genomes with long-read sequencing.

      Repeating this analysis with other gene families—immune and non-immune—is a great idea. While outside of the scope of this work, this will provide many opportunities for comparison and help tease apart the features that make this family unique.

      We also point readers to our companion paper, Ancient Trans-Species Polymorphism at the Major Histocompatibility Complex in Primates, which tackles different (but related) questions about long-term balancing selection in the primate MHC and also summarizes relevant past work in the area. This second paper addresses some questions raised by reviewers here.

    1. eLife Assessment

      The authors present new expression analysis software (TEKRABber) to help analyze expression correlations between transposable elements (TEs) and KRAB zinc finger (KRAB-ZNF) genes in experimentaly validated datasets. The authors use this method to decipher the regulatory networks of KRAB-ZNFs and TEs during human brain evolution and in Alzheimer's disease. The direction of the work is important, with potentially significant interest from others looking for a tool for correlative gene expression analysis across individual genomes and species. However, identified biases and shortcomings in the current analysis pipeline could lead to an unacceptable number of false positive and negative signals and thus impact the conclusions, leaving this work in its current form incomplete.

    2. Reviewer #1 (Public review):

      The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

      My main concerns are provided below:

      One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity (as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). The bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all), which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

      With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend to) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in the brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

      Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the strategy and KRABber software approach described highly biased and unreliable.

      There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs respectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

      Finally, there are some minor but important notes I want to share:

      The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen to associate with certain disease-associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

      There are a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

    3. Reviewer #2 (Public review):

      Summary:

      The aim was to decipher the regulatory networks of KRAB-ZNFs and TEs that have changed during human brain evolution and in Alzheimer's disease.

      Strengths:

      This solid study presents a valuable analysis and successfully confirms previous assumptions, but also goes beyond the current state of the art.

      Weaknesses:

      The design of the analysis needs to be slightly modified and a more in-depth analysis of the positive correlation cases would be beneficial. Some of the conclusions need to be reinterpreted.

    4. Author response:

      Reviewer #1 (Public review):

      The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

      Thank you very much for the insightful review of our manuscript.

      My main concerns are provided below:

      (1) One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity (as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). The bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all), which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

      With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend to) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in the brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

      We sincerely thank the reviewer for highlighting the potential issues of false positives and negatives in TE quantification. The reviewer provided valuable examples of how different TE classes, such as Alus, LTRs, LINEs, and SVAs, exhibit distinct behaviors in the genome. To our knowledge, specific tools like ERVmap (Tokuyama et al., 2018), which annotates ERVs, and LtrDetector (Joseph et al., 2019), which uses k-mer distributions to quantify LTRs, could indeed enhance precision by treating specific TE classes individually. We acknowledge that such approaches may yield more accurate results and appreciate the suggestion.

      In our study, we used TEtranscripts (Jin et al., 2015) prior to TEKRABber. TEtranscripts applies the Expectation Maximization (EM) algorithm to assign ambiguous reads as the following steps. Uniquely mapped reads are first assigned to genes, and reads overlapping genes and TEs are assigned to TEs only if they do not uniquely match an annotated gene. The remaining ambiguous reads are distributed based on EM iterations. While this approach may not be as specialized as the latest tools for specific TE classes, it provides a general overview of TE activity. TEtranscripts outputs subfamily-level TE expression data, which we used as input for TEKRABber to perform downstream analyses such as differential expression and correlation studies.

      We understand the importance of adapting tools to specific research objectives, including focusing on particular TE classes. TEKRABber is designed not to refine TE quantification at the mapping stage but to flexibly handle outputs from various TE quantification tools. It accepts raw TE counts as input in the form of dataframes, enabling diverse analytical pipelines. In the revised version of our manuscript, we will emphasize this distinction in the discussion and provide examples of how TEKRABber can integrate with other tools to enhance specificity and accuracy.

      (2) Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the strategy and KRABber software approach described highly biased and unreliable.

      There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs respectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

      We thank the reviewer for raising the important issue of accurately annotating the expanded repertoire of KRAB-ZNFs in primates, particularly the challenges of cross-species orthology and potential biases in RNA-seq data analysis. Indeed, we have also addressed this challenge in some of our previous papers (Nowick et al., 2010, Nowick et al., 2011 and Jovanovic et al., 2021).

      In the revised manuscript, we will include more details about our two-step strategy to ensure accurate KRAB-ZNF ortholog assignments. First, we employed the Gene Order Conservation (GOC) score from Ensembl BioMart as a primary filter, selecting only one-to-one orthologs with a GOC score above 75% across primates. This threshold, recommended in Ensembl’s ortholog quality control guidelines, ensures high-confidence orthology relationships, (http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html#goc).

      Second, we incorporated data from Jovanovic et al. (2021), which independently validated KRAB-ZNF orthologs across 27 primate genomes. This additional layer of validation allowed us to refine our dataset, resulting in the identification of 337 orthologous KRAB-ZNFs for differential expression analysis (Figure S2).

      We acknowledge that different annotation methods or criteria may for some genes yield variations in the identified orthologs. However, we believe that this combination provides a robust starting point for addressing the challenges raised, while we remain open to additional refinements in future analyses.

      (3) The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen to associate with certain disease-associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

      There are a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

      We fully acknowledge the concern that, given the large number of KRAB-ZNFs and their inherent variability, some associations with AD or other neurological disorders could occur by chance. This highlights the importance of additional functional studies to validate the causal role of KRAB-ZNF and TE interactions in disease contexts. While previous studies have indeed analyzed KRAB-ZNF and TE expression in human brain tissues, our study seeks to expand on this foundation by incorporating interspecies comparisons across primates. This approach enabled us to identify TE:KRAB-ZNF pairs that are uniquely present in healthy human brains, which may provide insights into their potential evolutionary significance and relevance to diseases like AD.

      In addition to analyzing RNA-seq data (GSE127898 and syn5550404), we have cross-validated our findings using ChIP-exo data for 159 KRAB-ZNF proteins and their TE binding regions in human (Imbeault et al., 2017). This allowed us to identify specific binding events between KRAB-ZNF and TE pairs, providing further support for the observed associations. We agree with the reviewer that additional experimental validations, such as functional studies, are critical to further establish the role of KRAB-ZNF and TE networks in AD. We hope that future research can build upon our findings to explore these associations in greater detail.

      Reviewer #2 (Public review):

      Summary:

      The aim was to decipher the regulatory networks of KRAB-ZNFs and TEs that have changed during human brain evolution and in Alzheimer's disease.

      Strengths:

      This solid study presents a valuable analysis and successfully confirms previous assumptions, but also goes beyond the current state of the art.

      Weaknesses:

      The design of the analysis needs to be slightly modified and a more in-depth analysis of the positive correlation cases would be beneficial. Some of the conclusions need to be reinterpreted.

      We sincerely thank the reviewer for the thoughtful summary, positive evaluation of our study, and constructive feedback. We appreciate the recognition of the strengths in our analysis and the valuable suggestions for improving its design and interpretation.

      We would like to briefly comment on the suggested modifications to the design here, and will provide a detailed point-by-point review later with our revised manuscript.

      The reviewer recommended considering a more recent timepoint, such as less than 25 million years ago (mya), to define the "evolutionary young group" of KRAB-ZNF genes and TEs when discussing the arms-race theory. This is indeed a valuable perspective, as the TE repressing functions by KRAB-ZNF proteins may have evolved more recently than the split between Old World Monkeys (OWM) and New World Monkeys (NWM) at 44.2 mya we used.

      Our rationale for selecting 44.2 mya is based on certain primate-specific TEs such as the Alu subfamilies, which emerged after the rise of Simiiformes and have been used in phylogenetic studies (Xing et al., 2007 and Williams et al., 2010). This timeframe allowed us to investigate the potential co-evolution of KRAB-ZNFs and TEs in species that emerged after the OWM-NWM split (e.g., human, chimpanzee, bonobos, and macaques used for this study). However, focusing only on KRAB-ZNFs and TEs younger than 25 million years would limit the analysis to just 9 KRAB-ZNFs and 92 TEs expressed in our datasets. While we will not conduct a reanalysis using this more recent timepoint, we will integrate the recommendation into the discussion section of the revised manuscript.

      Furthermore, we greatly appreciate the reviewer's detailed insights and suggestions for refining specific descriptions and interpretations in our manuscript. We will address these points in the revised version to ensure the content is presented with greater precision and clarity.

      Once again, we thank both reviewers for their valuable feedback, which provides significant input for strengthening our study.

    1. eLife Assessment

      This study provides useful insights into the ways in which germinal center B cell metabolism, particularly lipid metabolism, affects cellular responses. The authors use sophisticated mouse models to demonstrate that ether lipids are relevant for B cell homeostasis and efficient humoral responses. Although the data were collected from in vitro and in vivo experiments and analyzed using solid and validated methodology, more careful experiments and extensive revision of the manuscript will be required to strengthen the authors' conclusions.

    2. Reviewer #1 (Public review):

      In this manuscript, Hoon Cho et al. presents a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction.

      Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells. However, several major points need to be addressed:

      Major Comments:

      Figures 1 and 2

      The authors conclude, based on the results from these two figures, that PexRAP promotes the homeostatic maintenance and proliferation of B cells. In this section, the authors first use a tamoxifen-inducible full Dhrs7b knockout (KO) and afterwards Dhrs7bΔ/Δ-B model to specifically characterize the role of this molecule in B cells. They characterize the B and T cell compartments using flow cytometry (FACS) and examine the establishment of the GC reaction using FACS and immunofluorescence. They conclude that B cell numbers are reduced, and the GC reaction is defective upon stimulation, showing a reduction in the total percentage of GC cells, particularly in the light zone (LZ).

      The analysis of the steady-state B cell compartment should also be improved. This includes a more detailed characterization of MZ and B1 populations, given the role of lipid metabolism and lipid peroxidation in these subtypes.

      Suggestions for Improvement:

      - B Cell compartment characterization: A deeper characterization of the B cell compartment in non-immunized mice is needed, including analysis of Marginal Zone (MZ) maturation and a more detailed examination of the B1 compartment. This is especially important given the role of specific lipid metabolism in these cell types. The phenotyping of the B cell compartment should also include an analysis of immunoglobulin levels on the membrane, considering the impact of lipids on membrane composition.

      - GC Response Analysis Upon Immunization: The GC response characterization should include additional data on the T cell compartment, specifically the presence and function of Tfh cells. In Fig. 1H, the distribution of the LZ appears strikingly different. However, the authors have not addressed this in the text. A more thorough characterization of centroblasts and centrocytes using CXCR4 and CD86 markers is needed.<br /> The gating strategy used to characterize GC cells (GL7+CD95+ in IgD− cells) is suboptimal. A more robust analysis of GC cells should be performed in total B220+CD138− cells.

      - The authors claim that Dhrs7b supports the homeostatic maintenance of quiescent B cells in vivo and promotes effective proliferation. This conclusion is primarily based on experiments where CTV-labeled PexRAP-deficient B cells were adoptively transferred into μMT mice (Fig. 2D-F). However, we recommend reviewing the flow plots of CTV in Fig. 2E, as they appear out of scale. More importantly, the low recovery of PexRAP-deficient B cells post-adoptive transfer weakens the robustness of the results and is insufficient to conclusively support the role of PexRAP in B cell proliferation in vivo.

      - In vitro stimulation experiments: These experiments need improvement. The authors have used anti-CD40 and BAFF for B cell stimulation; however, it would be beneficial to also include anti-IgM in the stimulation cocktail. In Fig. 2G, CTV plots do not show clear defects in proliferation, yet the authors quantify the percentage of cells with more than three divisions. These plots should clearly display the gating strategy. Additionally, details about histogram normalization and potential defects in cell numbers are missing. A more in-depth analysis of apoptosis is also required to determine whether the observed defects are due to impaired proliferation or reduced survival.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In the minor part, there are issues with the interpretation of the data which might cause confusion for the readers.

    4. Author response:

      eLife Assessment

      This study provides useful insights into the ways in which germinal center B cell metabolism, particularly lipid metabolism, affects cellular responses. The authors use sophisticated mouse models to demonstrate that ether lipids are relevant for B cell homeostasis and efficient humoral responses. Although the data were collected from in vitro and in vivo experiments and analyzed using solid and validated methodology, more careful experiments and extensive revision of the manuscript will be required to strengthen the authors' conclusions.

      In addition to praise for the eLife system and transparency (public posting of the reviews; along with an opportunity to address them), we are grateful for the decision of the Editors to select this submission for in-depth peer review and to the referees for the thoughtful and constructive comments.

      In overview, we mostly agree with the specific comments and evaluation of strengths of what the work adds as well as with indications of limitations and caveats that apply to the breadth of conclusions. One can view these as a combination of weaknesses, of instances of reading more into the work than what it says, and of important future directions opened up by the findings we report. Regarding the positives, we appreciate the reviewers' appraisal that our work unveils a novel mechanism in which the peroxisomal enzyme PexRAP mediates B cell intrinsic ether lipid synthesis and promotes a humoral immune response. We are gratified by a recognition that a main contribution of the work is to show that a spatial lipidomic analysis can set the stage for discovery of new molecular processes in biology that are supported by using 2-dimensional imaging mass spectrometry techniques and cell type specific conditional knockout mouse models.

      By and large, the technical issues are items we will strive to improve. Ultimately, an over-arching issue in research publications in this epoch are the questions "when is enough enough?" and "what, or how much, advance will be broadly important in moving biological and biomedical research forward?" It appears that one limitation troubling the reviews centers on whether the mechanism of increased ROS and multi-modal death - supported most by the in vitro evidence - applies to germinal center B cells in situ, versus either a mechanism for decreased GC that mostly applies to the pre-GC clonal amplification (or recruitment into GC). Overall, we agree that this leap could benefit from additional evidence - but as resources ended we instead leave that question for the future other than the findings with S1pr2-CreERT2-driven deletion leading to less GC B cells. While we strove to be very careful in framing such a connection as an inference in the posted manuscript, we will revisit the matter via rechecking the wording when revising the text after trying to get some specific evidence.  

      In the more granular part of this provisional response (below), we will outline our plan prompted by the reviewers but also comment on a few points of disagreement or refinement (longer and more detailed explanation). The plan includes more detailed analysis of B cell compartments, surface level of immunoglobulin, Tfh cell population, a refinement of GC B cell markers, and the ex vivo GC B cell analysis for ROS, proliferation, and cell death. We will also edit the text to provide more detailed information and clarify our interpretation to prevent the confusion of our results.  At a practical level, some evidence likely is technologically impractical, and an unfortunate determinant is the lack of further sponsored funding for further work. The detailed point-by-point response to the reviewer’s comments is below.  

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Sung Hoon Cho et al. presents a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction.

      Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells.

      We appreciate these positive reactions and response, and agree with the overview and summary of the paper's approaches and strengths.

      However, several major points need to be addressed:

      Major Comments:

      Figures 1 and 2

      The authors conclude, based on the results from these two figures, that PexRAP promotes the homeostatic maintenance and proliferation of B cells. In this section, the authors first use a tamoxifen-inducible full Dhrs7b knockout (KO) and afterwards Dhrs7bΔ/Δ-B model to specifically characterize the role of this molecule in B cells. They characterize the B and T cell compartments using flow cytometry (FACS) and examine the establishment of the GC reaction using FACS and immunofluorescence. They conclude that B cell numbers are reduced, and the GC reaction is defective upon stimulation, showing a reduction in the total percentage of GC cells, particularly in the light zone (LZ).

      The analysis of the steady-state B cell compartment should also be improved. This includes a more detailed characterization of MZ and B1 populations, given the role of lipid metabolism and lipid peroxidation in these subtypes.

      Suggestions for Improvement:

      B Cell compartment characterization: A deeper characterization of the B cell compartment in non-immunized mice is needed, including analysis of Marginal Zone (MZ) maturation and a more detailed examination of the B1 compartment. This is especially important given the role of specific lipid metabolism in these cell types. The phenotyping of the B cell compartment should also include an analysis of immunoglobulin levels on the membrane, considering the impact of lipids on membrane composition.

      Although the manuscript is focused on post-ontogenic B cell regulation in Ab responses, we believe we will be able to polish a revised manuscript through addition of results of analyses suggested by this point in the review: measurement of surface IgM on and phenotyping of various B cell subsets, including MZB and B1 B cells, to extend the data in Supplemental Fig 1H and I. Depending on the level of support, new immunization experiments to score Tfh and analyze a few of their functional molecules as part of a B cell paper may be feasible.  

      - GC Response Analysis Upon Immunization: The GC response characterization should include additional data on the T cell compartment, specifically the presence and function of Tfh cells. In Fig. 1H, the distribution of the LZ appears strikingly different. However, the authors have not addressed this in the text. A more thorough characterization of centroblasts and centrocytes using CXCR4 and CD86 markers is needed.

      The gating strategy used to characterize GC cells (GL7+CD95+ in IgD− cells) is suboptimal. A more robust analysis of GC cells should be performed in total B220+CD138− cells.

      We first want to apologize the mislabeling of LZ and DZ in Fig 1H. The greenish-yellow colored region (GL7<sup>+</sup> CD35<sup>+</sup>) indicate the DZ and the cyan-colored region (GL7<sup>+</sup> CD35<sup>+</sup>) indicates the LZ.

      As a technical note, we experienced high background noise with GL7 staining uniquely with PexRAP deficient (Dhrs7b<sup>f/f</sup>; Rosa26-CreER<sup>T2</sup>) mice (i.e., not WT control mice). The high background noise of GL7 staining was not observed in B cell specific KO of PexRAP (Dhrs7b<sup>f/f</sup>; huCD20-CreER<sup>T2</sup>). Two formal possibilities to account for this staining issue would be if either the expression of the GL7 epitope were repressed by PexRAP or the proper positioning of GL7<sup>+</sup> cells in germinal center region were defective in PexRAP-deficient mice (e.g., due to an effect on positioning cues from cell types other than B cells). In a revised manuscript, we will fix the labeling error and further discuss the GL7 issue, while taking care not to be thought to conclude that there is a positioning problem or derepression of GL7 (an activation antigen on T cells as well as B cells).

      While the gating strategy for an overall population of GC B cells is fairly standard even in the current literature, the question about using CD138 staining to exclude early plasmablasts (i.e., analyze B220<sup>+</sup> CD138<sup>neg</sup> vs B220<sup>+</sup> CD138<sup>+</sup>) is interesting. In addition, some papers like to use GL7<sup>+</sup> CD38<sup>neg</sup> for GC B cells instead of GL7<sup>+</sup> Fas (CD95)<sup>+</sup>, and we thank the reviewer for suggesting the analysis of centroblasts and centrocytes. For the revision, we will try to secure resources to revisit the immunizations and analyze them for these other facets of GC B cells (including CXCR4/CD86) and for their GL7<sup>+</sup> CD38<sup>neg</sup>. B220<sup>+</sup> CD138<sup>-</sup> and B220<sup>+</sup> CD138<sup>+</sup> cell populations. 

      We agree that comparison of the Rosa26-CreERT2 results to those with B cell-specific loss-of-function raise a tantalizing possibility that Tfh cells also are influenced by PexRAP. Although the manuscript is focused on post-ontogenic B cell regulation in Ab responses, we hope to add a new immunization experiments that scores Tfh and analyzes a few of their functional molecules could be added to this B cell paper, depending on the ability to wheedle enough support / fiscal resources.

      - The authors claim that Dhrs7b supports the homeostatic maintenance of quiescent B cells in vivo and promotes effective proliferation. This conclusion is primarily based on experiments where CTV-labeled PexRAP-deficient B cells were adoptively transferred into μMT mice (Fig. 2D-F). However, we recommend reviewing the flow plots of CTV in Fig. 2E, as they appear out of scale. More importantly, the low recovery of PexRAP-deficient B cells post-adoptive transfer weakens the robustness of the results and is insufficient to conclusively support the role of PexRAP in B cell proliferation in vivo.

      In the revision, we will edit the text and try to adjust the digitized cytometry data to allow more dynamic range to the right side of the upper panels in Fig. 2E, and otherwise to improve the presentation of the in vivo CTV result. However, we feel impelled to push back respectfully on some of the concern raised here. First, it seems to gloss over the presentation of multiple facets of evidence. The conclusion about maintenance derives primarily from Fig. 2C, which shows a rapid, statistically significant decrease in B cell numbers (extending the finding of Fig. 1D, a more substantial decrease after a bit longer a period). As noted in the text, the rate of de novo B cell production does not suffice to explain the magnitude of the decrease.

      In terms of proliferation, we will improve presentation of the Methods but the bottom line is that the recovery efficiency is not bad (comparing to prior published work) inasmuch as transferred B cells do not uniformly home to spleen. In a setting where BAFF is in ample supply in vivo, we transferred equal numbers of cells that were equally labeled with CTV and counted B cells.  The CTV result might be affected by lower recovered B cell with PexRAP deficiency, generally, the frequencies of CTV<sup>low</sup> divided population are not changed very much. However, it is precisely because of the pitfalls of in vivo analyses that we included complementary data with survival and proliferation in vitro. The proliferation was attenuated in PexRAP-deficient B cells in vitro; this evidence supports the conclusion that proliferation of PexRAP knockout B cells is reduced. It is likely that PexRAP deficient B cells also have defect in viability in vivo as we observed the reduced B cell number in PexRAP-deficient mice. As the reviewer noticed, the presence of a defect in cycling does, in the transfer experiments, limit the ability to interpret a lower yield of B cell population after adoptive transfer into µMT recipient mice as evidence pertaining to death rates. We will edit the text of the revision with these points in mind.

      - In vitro stimulation experiments: These experiments need improvement. The authors have used anti-CD40 and BAFF for B cell stimulation; however, it would be beneficial to also include anti-IgM in the stimulation cocktail. In Fig. 2G, CTV plots do not show clear defects in proliferation, yet the authors quantify the percentage of cells with more than three divisions. These plots should clearly display the gating strategy. Additionally, details about histogram normalization and potential defects in cell numbers are missing. A more in-depth analysis of apoptosis is also required to determine whether the observed defects are due to impaired proliferation or reduced survival.

      As suggested by reviewer, testing additional forms of B cell activation can help explore the generality (or lack thereof) of findings. We plan to test anti-IgM stimulation together with anti-CD40 + BAFF as well as anti-IgM + TLR7/8, and add the data to a revised and final manuscript.

      With regards to Fig. 2G (and 2H), in the revised manuscript we will refine the presentation (add a demonstration of the gating, and explicate histogram normalization of FlowJo).

      It is an interesting issue in bioscience, but in our presentation 'representative data' really are pretty representative, so a senior author is reminded of a comment Tak Mak made about a reduction (of proliferation, if memory serves) to 0.7 x control. [His point in a comment to referees at a symposium related that to a salary reduction by 30% :) A mathematical alternative is to point out that across four rounds of division for WT cells, a reduction to 0.7x efficiency at each cycle means about 1/4 as many progeny.] 

      We will try to edit the revision (Methods, Legends, Results, Discussion] to address better the points of the last two sentences of the comment, and improve the details that could assist in replication or comparisons (e.g., if someone develops a PexRAP inhibitor as potential therapeutic).

      For the present, please note that the cell numbers at the end of the cultures are currently shown in Fig 2, panel I. Analogous culture results are shown in Fig 8, panels I, J, albeit with harvesting at day 5 instead of day 4. So, a difference of ≥ 3x needs to be explained. As noted above, a division efficiency reduced to 0.7x normal might account for such a decrease, but in practice the data of Fig. 2I show that the number of PexRAP-deficient B cells at day 4 is similar to the number plated before activation, and yet there has been a reasonable amount of divisions. So cell numbers in the culture of  mutant B cells are constant because cycling is active but decreased and insufficient to allow increased numbers ("proliferation" in the true sense) as programmed death is increased. In line with this evidence, Fig 8G-H document higher death rates [i.e., frequencies of cleaved caspase3<sup>+</sup> cell and Annexin V<sup>+</sup> cells] of PexRAP-deficient B cells compared to controls. Thus, the in vitro data lead to the conclusion that both decreased division rates and increased death operate after this form of stimulation.

      An inference is that this is the case in vivo as well - note that recoveries differed by ~3x (Fig. 2D), and the decrease in divisions (presentation of which will be improved) was meaningful but of lesser magnitude (Fig. 2E, F).  

      Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      We appreciate this positive response and agree with the overview and summary of the paper's approaches and strengths.

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In the minor part, there are issues with the interpretation of the data which might cause confusion for the readers.

      Issues about contributions of cell cycling and divisions on the one hand, and susceptibility to death on the other, were discussed above, amplifying on the current manuscript text. The aggregate data support a model in which both processes are impacted for mature B cells in general, and mechanistically the evidence and work focus on the increased ROS and modes of death. Although the data in Fig. 7 do provide evidence that GC B cells themselves are affected, we agree that resource limitations had militated against developing further evidence about cycling specifically for GC B cells. We will hope to be able to obtain sufficient data from some specific analysis of proliferation in vivo (e.g., Ki67 or BrdU) as well as ROS and death ex vivo when harvesting new samples from mice immunized to analyze GC B cells for CXCR4/CD86, CD38, CD138 as indicated by Reviewer 1.  As suggested by Reviewer 2, we will further discuss the possible mechanism(s) by which proliferation of PexRAP-deficient B cells is impaired. We also will edit the text of a revision where to enhance clarity of data interpretation - at a minimum, to be very clear that caution is warranted in assuming that GC B cells will exhibit the same mechanisms as cultures in vitro-stimulated B cells.

    1. eLife Assessment

      This study is important, advancing our understanding of how humans adapt to uncertainty in dynamic environments by investigating the interplay between two types of uncertainty-volatility (systematic changes in outcomes) and noise (random variability in outcomes). Using an innovative experimental task, reinforcement learning (RL) models, and Bayesian Observer Models (BOM), the authors demonstrate that humans exhibit approximate rationality, often misattributing noise as volatility and adopting suboptimal learning rates in noisy conditions. The evidence is compelling, supported by a well-designed experimental task that independently manipulates noise and volatility, robust behavioral data, and computational modeling; the inclusion of BOM lesioning and physiological validation through pupillometry provides a nuanced understanding of suboptimal human learning. While the study could benefit from expanding the model space (e.g., by including latent state models) and offering greater clarity in task instructions and raw behavioral data, these limitations do not undermine the strength of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

    1. eLife Assessment

      This important study provides insights into the neurodevelopmental trajectories of structural and functional connectivity gradients in the human brain and their potential associations with behaviour and psychopathology. While certain aspects of the methodology are rigorous, the evidence supporting the findings is currently incomplete and would benefit from additional sensitivity analyses to evaluate methodological choices supporting the findings. This study will be of interest to neuroscientists interested in understanding functional connectivity across development.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors advance our understanding of neurodevelopmental changes in the brain's structural and functional connectivity, as well as their coupling. The paper presents evidence of alterations in and stability of the principal organizational gradients of structure and function across development (age) and contrasts them between neurotypical and neurodivergent individuals. The authors further extend their findings by exploring links with graph theory measures of brain connectivity and indices of nodal structure-function coupling. Finally, the developmental shifts in structural and functional brain organization are examined for potential associations with cognitive and psychopathological markers. The results suggest that structure-function coupling, both brain-wide and within specific functional networks, is associated with certain cognitive dimensions but not with measures of psychopathology.

      Strengths:

      This manuscript makes a significant contribution to the field by synthesizing previous research while offering novel insights into the developmental trajectories of brain organization. A key strength of this study lies in its integration of both structural and functional connectivity data, providing a comprehensive view of brain changes throughout development. The authors present findings that challenge earlier reports of shifts in principal gradients during late childhood and early adolescence (e.g., Dong et al., 2021; Xia et al., 2022), underscoring an important inconsistency that could have broader implications for our understanding of developmental brain reorganization. The introduction and discussion sections are well-crafted, offering a thorough review of relevant prior studies and effectively situating the current findings within the broader context of the literature. Additionally, the study design and methodology are detailed and adhere to recommended best practices, demonstrating a commendable level of rigor in the formulation of the study and its various assessments.

      Weaknesses:

      Despite these strengths, I think there are aspects of the manuscript that would benefit from further refinement. Below is detailed feedback and suggestions provided point-by-point.

      Lack of Sensitivity Analyses for some Key Methodological Decisions:<br /> Certain methodological choices in this manuscript diverge from approaches used in previous works. In these cases, I recommend the following: (i) The authors could provide a clear and detailed justification for these deviations from established methods, and (ii) supplementary sensitivity analyses could be included to ensure the robustness of the findings, demonstrating that the results are not driven primarily by these methodological changes. Below, I outline the main areas where such evaluations are needed:<br /> - Use of Communicability Matrices for Structural Connectivity Gradients: The authors chose to construct structural connectivity gradients using communicability matrices, arguing that diffusion map embedding "requires a smooth, fully connected matrix." However, by definition, the creation of the affinity matrix already involves smoothing and ensures full connectedness. I recommend that the authors include an analysis of what happens when the communicability matrix step is omitted. This sensitivity test is crucial, as it would help determine whether the main findings hold under a simpler construction of the affinity matrix. If the results significantly change, it could indicate that the observations are sensitive to this design choice, thereby raising concerns about the robustness of the conclusions. Additionally, if the concern is related to the large range of weights in the raw structural connectivity (SC) matrix, a more conventional approach is to apply a log-transformation to the SC weights (e.g., log(1+𝑆𝐶𝑖𝑗)), which may yield a more reliable affinity matrix without the need for communicability measures.<br /> - Individual-Level Gradients vs. Group-Level Gradients: Unlike previous studies that examined alterations in principal gradients (e.g., Xia et al., 2022; Dong et al., 2021), this manuscript focuses on gradients derived directly from individual-level data. In contrast, earlier works have typically computed gradients based on grouped data, such as using a moving window of individuals based on age (Xia et al.) or evaluating two distinct age groups (Dong et al.). I believe it is essential to assess the sensitivity of the findings to this methodological choice. Such an evaluation could clarify whether the observed discrepancies with previous reports are due to true biological differences or simply a result of different analytical strategies.<br /> - Procrustes Transformation: It is unclear why the authors opted to include a Procrustes transformation in this analysis, especially given that previous related studies (e.g., Dong et al.) did not apply this step. I believe it is crucial to evaluate whether this methodological choice influences the results, particularly in the context of developmental changes in organizational gradients. Specifically, the Procrustes transformation may maximize alignment to the group-level gradients, potentially masking individual-level differences. This could result in a reordering of the gradients (e.g., swapping the first and second gradients), which might obscure true developmental alterations. It would be informative to include an analysis showing the impact of performing vs. omitting the Procrustes transformation, as this could help clarify whether the observed effects are robust or an artifact of the alignment procedure. (Please also refer to my comment on adding a subplot to Figure 1)<br /> - SC-FC Coupling Metric: The approach used to quantify nodal SC-FC coupling in this study appears to deviate from previously established methods in the field. The manuscript describes coupling as the "Spearman-rank correlation between Euclidean distances between each node and all others within structural and functional manifolds," but this description is unclear and lacks sufficient detail. Furthermore, this differs from what is typically referred to as SC-FC coupling in the literature. For instance, the cited study by Park et al. (2022) utilizes a multiple linear regression framework, where communicability, Euclidean distance, and shortest path length are independent variables predicting functional connectivity (FC), with the adjusted R-squared score serving as the coupling index for each node. On the other hand, the Baum et al. (2020) study, also cited, uses Spearman correlation, but between raw structural connectivity (SC) and FC values. If the authors opt to introduce a novel coupling metric, it is essential to demonstrate its similarity to these previous indices. I recommend providing an analysis (supplementary) showing the correlation between their chosen metric and those used in previous studies (e.g., the adjusted R-squared scores from Park et al. or the SC-FC correlation from Baum et al.). Furthermore, if the metrics are not similar and results are sensitive to this alternative metric, it raises concerns about the robustness of the findings. A sensitivity analysis would therefore be helpful (in case the novel coupling metric is not similar to previous ones) to determine whether the reported effects hold true across different coupling indices.

      Methodological ambiguity/lack of clarity in the description of certain evaluation steps:<br /> Some aspects of the manuscript's methodological descriptions are ambiguous, making it challenging for future readers to fully reproduce the analyses based on the information provided. I believe the following sections would benefit from additional detail and clarification:<br /> - Computation of Manifold Eccentricity: The description of how eccentricity was computed (both in the results and methods sections) is unclear and may be problematic. The main ambiguity lies in how the group manifold origin was defined or computed. Specifically:<br /> (1) In the results section, it appears that separate manifold origins were calculated for the NKI and CALM groups, suggesting a dataset-specific approach.<br /> (2) Conversely, the methods section implies that a single manifold origin was obtained by somehow combining the group origins across the three datasets, which seems contradictory.<br /> Moreover, including neurodivergent individuals in defining the central group manifold origin is conceptually problematic. Given that neurodivergent participants might exhibit atypical brain organization (as suggested by Fig. 1), this inclusion could skew the definition of what should represent a typical or normative brain manifold. A more appropriate approach might involve constructing the group manifold origin using only the neurotypical participants from both the NKI and CALM datasets. Given the reported similarity between group-level manifolds of neurotypical individuals in CALM and NKI, it would be reasonable to expect that this combined origin should be close to the origin computed within neurotypical samples of either NKI or CALM. As a sanity check, I recommend reporting the distance of the combined neurotypical manifold origin to the centers of the neurotypical manifolds in each dataset. Moreover, if the manifold origin was constructed while utilizing all samples (including neurodivergent samples) I think this needs to be reconsidered.<br /> - Computation of SC-FC coupling: As noted in a previous comment, the explanation of this procedure is vague. The description lacks detail on the specific steps taken and differs from previous standard approaches in the field. I suggest clarifying the methodology and comparing with previous SC-FC coupling metrics.<br /> - Performing Procrustes transformation: The brief explanation in the first paragraph of page 30 does not provide enough information about the procedure or its justification. Since the Procrustes transformation alters the shape of individual gradients, it could artificially inflate consistency across development. I recommend including a rationale for using the Procrustes transformation and conducting a sensitivity analysis to assess its impact on the findings. Additionally, clarifying how exactly the transformation was applied to align gradients across hemispheres, individuals, and or datasets would help resolve ambiguity.

      Insufficient Supporting Evaluations for Certain Claims:<br /> There are instances where additional analyses are necessary to substantiate the claims made in the manuscript. Without these evaluations, some conclusions may be premature or potentially misleading. I believe the following points need further analysis or, alternatively, adjustments to the claims:<br /> - Evaluating the Consistency of Gradients Across Development: The results shown in Fig. 1.e are used as evidence suggesting that gradients are consistent across ages. However, I believe additional analyses are required to identify potential sources of the observed inconsistency compared to previous works. The claim that the principal gradient explains a similar degree of variance across ages does not necessarily imply that the spatial structure of the gradient remains stable. The observed variance explanation is hence not enough to ascertain inconsistency with findings from Dong et al., as the spatial configuration of gradients may still change over time. Moreover, the introduction of the Procrustes transformation (not used by Dong et al.) further ambiguates the cause of this inconsistency. I suggest the following additional analyses to strengthen this claim: (1) Alignment to Group-Level Gradients: Assess how much of the variance in individual FC matrices is explained by each of the group-level gradients (G1, G2, and G3, for both FC and SC). This analysis could be visualized similarly to Fig. 1.e, with age on the x-axis and variance explained on the y-axis. If the explained variance varies as a function of age, it may indicate that the gradients are not as consistent as currently suggested. (2) For each individual's gradients (G1, G2, and G3, separately for FC and SC, without Procrustes transformation), evaluate their spatial similarity to the corresponding group-level gradients using a similarity metric (e.g., correlation coefficient). High spatial similarity, without a Procrustes transformation, would support the claim of stable gradient structures across development. On the other hand, if the similarities alter during development (e.g. such that at a certain age, individual G1 is less similar to group G1) this would contradict the stability of gradients during development. These additional analyses could potentially be included as additional panels in Fig. 1. In case significant deviations are observed, it might help refine the interpretation of the results and provide a more nuanced understanding of developmental changes in gradient organization.<br /> - Prediction vs. Association Analysis: The term "prediction" is used throughout the manuscript to describe what appear to be in-sample association tests. This terminology may be misleading, as prediction generally implies an out-of-sample evaluation where models trained on a subset of data are tested on a separate, unseen dataset. If the goal of the analyses is to assess associations rather than make true predictions, I recommend refraining from using the term "prediction" and instead clarifying the nature of the analysis. Alternatively, if prediction is indeed the intended aim (which would be more compelling), I suggest conducting the evaluations using a k-fold cross-validation framework. This would involve training the Generalized Additive Mixed Models (GAMMs) on a portion of the data and testing their predictive accuracy on a held-out sample (i.e., different individuals). Additionally, the current design appears to focus on predicting SC-FC coupling using cognitive or pathological dimensions. This is contrary to the more conventional approach of predicting behavioral or pathological outcomes from brain markers like coupling. Could the authors clarify why this reverse direction of analysis was chosen? Understanding this choice is crucial, as it impacts the interpretation and potential implications of the findings.

      Methodological considerations<br /> - In typical applications of diffusion map embedding, sparsification (e.g., retaining only the top 10% of the strongest connections) is often employed at the vertex-level resolution to ensure computational feasibility. However, since the present study performs the embedding at the level of 200 brain regions (a considerably coarser resolution), this step may not be necessary or justifiable. Specifically, for FC, it might be more appropriate to retain all positive connections rather than applying sparsification, which could inadvertently eliminate valuable information about lower-strength connections. Whereas for SC, as the values are strictly non-negative, retaining all connections should be feasible and would provide a more complete representation of the structural connectivity patterns. Given this, it would be helpful if the authors could clarify why they chose to include sparsification despite the coarser regional resolution, and whether they considered this alternative approach (using all available positive connections for FC and all non-zero values for SC). It would be interesting if the authors could provide their thoughts on whether the decision to run evaluations at the resolution of brain regions could itself impact the functional and structural manifolds, their alteration with age, and or their stability (in contrast to Dong et al. which tested alterations in high-resolution gradients).

      The Issue of Abstraction and Benefits of the Gradient-Based View:<br /> - The manuscript interprets the eccentricity findings as reflecting changes along the segregation-integration spectrum. Given this, it is unclear why a more straightforward analysis using established graph-theory measures of segregation-integration was not pursued instead. Mapping gradients and computing eccentricity adds layers of abstraction and complexity. If similar interpretations can be derived directly from simpler graph metrics, what additional insights does the gradient-based framework offer? While the manuscript argues that this approach provides "a more unifying account of cortical reorganization," it is not evident why this abstraction is necessary or advantageous over traditional graph metrics. Clarifying these benefits would strengthen the rationale for using this method.

    3. Reviewer #2 (Public review):

      Summary:

      This study aims to show how structural and functional brain organization develops during childhood and adolescence using two large neuroimaging datasets. It addresses whether core principles of brain organization are stable across development, how they change over time, and how these changes relate to cognition and psychopathology. The study finds that brain organization is established early and remains stable but undergoes gradual refinement, particularly in higher-order networks. Structural-functional coupling is linked to better working memory but shows no clear relationship with psychopathology.

      Strengths:

      This study effectively integrates two different modalities (structural and functional) to identify shared patterns. It is supported by a relatively large dataset, which enhances its value and robustness.

      Weaknesses:

      General Comments:<br /> - The introduction is overly long and includes numerous examples that can distract readers unfamiliar with the topic from the main research questions.

      - While the methods are thorough, it is not always clear whether the optimal approaches were chosen for each step, considering the available data.<br /> Detailed Comments:<br /> - The use of COMBAT may have excluded extreme participants from both datasets, which could explain the lack of correlations found with psychopathology.<br /> - Some differences in developmental trajectories between CALM and NKI (e.g., Figure 4d) are not explained. Are these differences expected, or do they suggest underlying factors that require further investigation?<br /> - There is no discussion of whether the stable patterns of brain organization could result from preprocessing choices or summarizing data to the mean. This should be addressed to rule out methodological artifacts.

    1. eLife Assessment

      This is a useful study that adds new data to how different DAG pools influence cellular signaling, and dissects how the enzyme Dip2 modulates the minor lipid signaling DAG pool, which is distinct from the DAG pool utilized in membrane biosynthesis. The paper presents solid evidence on how different DAG pools influence cellular signaling.

    2. Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      (2) Does Dip2 colocalize with Plc1 or Pkc1? Does Dip2 reach the plasma membrane upon Plc activation?

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      We thank the reviewer for finding this study of interest and appreciating our multi-pronged approach to prove our hypothesis that a distinct pool of Dip2 regulated by DAGs activate PKC signalling.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

      We thank the reviewer for the valuable suggestions regarding the spatial organization of Dip2 in cells under the influence of different DAG pools. As suggested, we will probe the localization of Dip2 in the absence of Pah1. We would also trace the localization of Dip2 in LRO1 and DGA1 deletion where the bulk DAGs are accumulated and present the data in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      We would like to thank the reviewer for the positive comments on our work. We are happy to know that the reviewer finds the study novel and interesting.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      We agree with the reviewer that PKC activated by C36:0 and C36:1 DAGs is a critical conclusion of our work. While we understand that there is no obvious structural explanation as to how the DAG binding C1 domain of PKC attains the acyl chain specificity for DAGs, our conclusion that yeast Pkc1 is selective for C36:0 and C36:1 DAGs is supported by a combination of robust in vitro and in vivo data

      1. In Vitro Evidence: The liposome binding assays demonstrate that the Pkc1 C1 domain only binds the selective DAG and does not interact with bulk DAGs.

      2. In Vivo Evidence: Lipidomic analyses of wild-type cells subjected to cell wall stress reveal increased levels of C36:0 and C36:1 DAGs, while levels of bulk DAGs remain unaffected. This clearly parallels the Dip2 knockout scenario in which the levels of the same set of DAGs go up and Pkc1 gets hyperactivated.

      These findings collectively indicate that Pkc1 neither binds nor is activated by bulk DAGs, reinforcing its specificity for C36:0 and C36:1 DAGs. It is also further corroborated by DGA1 and LRO1 knockouts wherein the increase of the bulk DAGs does not result in a significant increase in Pkc1 signalling.

      Moreover, elucidating the structural basis of this selectivity would require a specific DAG-bound C1 domain structure of Pkc1, which is difficult owing to the flexibility of the longer acyl chains present in C36:0 and C36:1 DAGs. Furthermore, capturing the full-length Pkc1 structure that might provide deeper insights has been challenging for several other groups for a long time. Additionally, we believe that the DAG selectivity by Pkc1 is more of a membrane-associated phenomenon wherein these DAGs might create a specific microdomain or a particular curvature which are required for Pkc1’s ability to bind DAG followed by activation. Investigating this would require extensive structural and biophysical studies, which are beyond the scope of the current work but are planned for future research.

      (2) Does Dip2 colocalize with Plc1 or Pkc1? Does Dip2 reach the plasma membrane upon Plc activation?

      Thank you for your questions regarding the colocalization and potential translocation of Dip2 upon Plc1 or Pkc1 activation.

      In the wild-type scenario, Dip2 does not colocalize with Pkc1. Dip2 predominantly localizes to the mitochondria and mitochondria-vacuole contact sites, while Pkc1 is found in the cytosol, plasma membrane and bud site. Moreover, the localization of Plc1 has not yet been studied in yeast and therefore we currently lack data on the colocalisation of Dip2 and Plc1.

      However, to investigate whether Dip2 translocates to the plasma membrane under conditions requiring Plc1 or Pkc1 activation, we plan to probe the localization of Dip2 under cell wall stress condition. This would provide a better understanding of the spatial crosstalk between Dip2 and Pkc1. We will include the results in the revised manuscript.

    1. eLife Assessment

      This manuscript describes valuable findings regarding the expression pattern of orexin receptors in the midbrain and how manipulating this system influences several behaviors, such as context-induced locomotor activity and exploration. The overall strength of evidence - which includes anatomical, viral manipulation studies, and brain imaging - is solid and broadly substantiates claims in the paper. However, there are several areas in which the conclusions are only partially supported by the combination of methods used. These results have implications for understanding the neural underpinnings of reward and will be of interest to neuroscientists and cognitive scientists with an interest in the neurobiology of reward.

    2. Reviewer #1 (Public review):

      In this manuscript, the role of orexin receptors in dopamine transmission is studied. It extends previous findings suggesting an interplay between these two systems in regulating behaviour by first characterizing the expression of orexin receptors in the midbrain and then disrupting orexin transmission in dopaminergic neurons by deleting its predominant receptor, OX1R (Ox1R fl/fl, Dat-Cre tg/wt mice). Electrophysiological and calcium imaging data suggest that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons but does not seem to induce c-Fos expression. Behavioral effects of depleting OX1R from dopaminergic neurons include enhanced novelty-induced locomotion and exploration, relative to littermate controls (Ox1R fl/fl, Dat-Cre wt/wt). However, no difference between groups is observed in tests that measure reward processing, anxiety, and energy homeostasis. To test whether the depletion of OX1R alters overall orexin-triggered activation across the brain, PET imaging is used in OX1R∆DAT knockout and control mice. This analysis reveals that several regions show higher neuronal activation after orexin injection in OX1R∆DAT mice, but the authors focus their follow-up study on the dorsal bed nucleus of the stria terminalis (BNST) and lateral paragigantocellular nucleus (LPGi). Dopaminergic inputs and expression of dopamine receptors type-1 and -2 (DRD1 & DRD2) are assessed and compared to control demonstrating a moderate decrease in DRD1 and DRD2 expression in the BNST of OX1R∆DAT mice and unaltered expression of DRD2, with absence of DRD1 expression in LPGi of both groups. Overall, this study is valuable for the information it provides on orexin receptor expression and function in behaviour, as well as for the new tools it generated for the specific study of this receptor in dopaminergic circuits.

      Strengths:

      The use of a transgenic line that lacks OX1R in dopamine-transporter expressing neurons is a strong approach to dissect the direct role of orexin in modulating dopamine signaling in the brain. The battery of behavioral assays used to study this line provides valuable information for researchers interested in the interplay between dopamine and orexin systems and their role in animal physiology.

      Weaknesses:

      This study falls short in providing evidence for an anatomical substrate and mechanism underlying the altered behavior observed in mice lacking orexin receptor subtype 1 in dopaminergic neurons. How orexin transmission in dopaminergic neurons regulates the expression of postsynaptic dopamine receptors (as observed in the BNST of OX1R∆DAT mice) is an intriguing question not addressed in this study. An important aspect not investigated in this study is whether the disruption of orexin activity affects dopamine release in target areas.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript examines expression of orexin receptors in midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that substantia nigra dopamine neurons predominantly express orexin receptor 1 subtype and then go on to delete this receptor in dopamine transporter-expressing neurons using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that, in the absence of this receptor, orexin is no longer able to excite dopamine neurons of the substantia nigra.

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin.

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus in on changes in dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. They also show changes in receptor expression in the transgenic mice. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths:

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:<br /> (1) Distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.<br /> (2) Use of the genetic model that knocks out a specific orexin receptor subtype from dopamine-transporter-expressing neurons is a useful model and helps to narrow down the behavioral significance of this interaction.<br /> (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially since two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood.

      Weaknesses:

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies do not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. In addition, the authors' working model for how they think orexin-dopamine interactions contribute to behavior under normal physiological conditions is not well-described.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      In this manuscript, the role of orexin receptors in dopamine transmission is studied. It extends previous findings suggesting an interplay of these two systems in regulating behaviour by first characterising the expression of orexin receptors in the midbrain and then disrupting orexin transmission in dopaminergic neurons by deleting its predominant receptor, OX1R (Ox1R fl/fl, DatCre tg/wt mice). Electrophysiological and calcium imaging data suggest that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons, but does not seem to induce c-Fos expression. Behavioural effects of depleting OX1R from dopaminergic neurons includes enhanced noveltyinduced locomotion and exploration, relative to littermate controls (Ox1R fl/fl, Dat-Cre wt/wt). However, no difference between groups is observed in tests that measure reward processing, anxiety, and energy homeostasis. To test whether depletion of OX1R alters overall orexin-triggered activation across the brain, PET imaging is used in OX1R∆DAT knockout and control mice. This analysis reveals that several regions show a higher neuronal activation after orexin injection in OX1R∆DAT mice, but the authors focus their follow up study on the dorsal bed nucleus of the stria terminalis (BNST) and lateral paragigantocellular nucleus (LPGi). Dopaminergic inputs and expression of dopamine receptors type-1 and -2 (DRD1 & DRD2) is assessed and compared to control demonstrating moderate decrease of DRD1 and DRD2 expression in BNST of OX1R∆DAT mice and unaltered expression of DRD2, with absence of DRD1 expression in LPGi of both groups. Overall, this study is valuable for the information it provides on orexin receptor expression and function on behaviour and for the new tools it generated for the specific study of this receptor in dopaminergic circuits. 

      Strengths: 

      The use of a transgenic line that lacks OX1R in dopamine-transporter expressing neurons is a strong approach to dissect the direct role of orexin in modulating dopamine signalling in the brain. The battery of behavioural assays to study this line provides a valuable source of information for researchers interested in the role of orexin in animal physiology. 

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses: 

      This study falls short in providing evidence for an anatomical substrate of the altered behaviour observed in mice lacking orexin receptor subtype 1 in dopaminergic neurons. How orexin transmission in dopaminergic neurons regulates the expression of postsynaptic dopamine receptors (as observed in BNST of OX1R<sup>∆DAT</sup> mice) is an intriguing question poorly discussed. Whether disruption of orexin activity alters dopamine release in target areas is an important point not addressed. 

      We identified dopaminergic fibers and dopamine receptors in the dBNST and LPGi, suggesting anatomical basis for dopamine neurons to regulate neural activity and receptor expression levels in these areas. PET imaging scan and c-Fos staining revealed that Ox1R signaling in dopaminergic cells regulates neuronal activity in dBNST and LPGi. The expression levels of Th were unchanged in both regions. Dopamine receptor 2 (DRD2), but not DRD1, is expressed in LPGi. The deletion of Ox1R in DAT-expressing cells did not affect DRD2 expression in LPGi. The expression levels of DRD1 and DRD2 were decreased or showed a tendency to decrease in dBNST. 

      We included the comments in the discussion in this revised manuscript (lines 308-312): ‘The expression levels of Th were not altered in dBNST or LPGi by Ox1R deletion in dopaminergic neurons. It remains unclear whether dopamine release is affected in these regions. It is possible that either the dopaminergic regulation of neuronal activity or the changes in dopamine release could lead to the decreased expression of dopamine receptors in dBNST.’

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript examines expression of orexin receptors in midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that dopamine neurons predominantly express orexin receptor 1 subtype and then go on to delete this receptor in dopamine transporter-expressing using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that, in the absence of this receptor, orexin is no longer able to excite dopamine neurons of the substantia nigra. 

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin. 

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus in on changes in dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths: 

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:

      (1) Distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.

      (2) Use of the genetic model that knocks out a specific orexin receptor subtype from dopaminetransporter-expressing neurons is a useful model and helps to narrow down the behavioral significance of this interaction.  

      (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially that two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood. 

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses: 

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies do not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. 

      We agree that exploring the orexin-dopamine interactions in more detail and focusing on the behavioral impact of their final target areas (e.g., dBNST or LPGi), would provide valuable data. While we are very interested in pursuing these studies, the aim of the present manuscript is to provide an overview of the behavioral roles of orexin-dopamine interaction and to propose some promising downstream pathways in a relatively broad and systematic manner. 

      In many places in the Results, insufficient explanation and statistical reporting is provided. Throughout the Results - especially in the section on behavior although not restricted to this part - statements are made without statistical tests presented to back up the claims, e.g., "Compared to controls, Ox1R<sup>ΔDAT</sup> 143 mice did not show significant changes in spontaneous locomotor activity in home cages" (L143) and "In a hole-board test, female Ox1RΔDAT mice showed increased nose pokes into the holes in early (1st and 2nd) sessions compared to control mice" (L151). In other places, ANOVAs are mentioned but full results including main effects and interactions are not described in detail, e.g., in F3-S3, only a single p-value is presented and it is difficult to know if this is the interaction term or a post hoc test (L205). These and all other statements need statistics included in the text as support. Addition of these statistical details was also requested by the editor. 

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as main effects and interactions, are presented alongside the source data in the respective spreadsheets. We thank the reviewer for pointing out our lack of clarity in the manuscript. In this revised manuscript, we included the statistical details of ANOVAs mentioned above in the figure legends. In the figure legends, we also explained that the full statistics were provided alongside the source data in the supplementary materials.

      In the presentation of reward processing this is particularly important as no statistical tests are shown to demonstrate that controls show a cocaine-induced preference or a sucrose preference. Here, one option would be to perform one-sample t-tests showing that the data were different to zero (no preference). As it is, the claim that "Both of the control and Ox1RΔDAT groups showed a preference for cocaine injection" is not yet statistically supported. 

      We thank the reviewer for the suggestions. We have added the one-sample t-test results in this revised manuscript (Figure 2–figure supplement 4, lines 171 - 183). 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      Can the authors comment on overlap between DAT and Ox1R in brain areas outside VTA/SN? Is there any? 

      We only focused on the expression patterns of orexin receptors in VTA/SN, and we did not examine other brain regions. Additionally, little is known from the literature about the expression of Ox1R in DAT-expressing cells in brain areas outside VTA/SN. Further analysis is necessary to answer this question. We have added the comment in our discussion (lines 243 - 344).

      For the Ca2+ imaging experiment, it is unclear to me why the authors do not show all the neurons (almost 160 in total) and just select 5 neurons to show for each condition. 

      Heat maps of all recorded neurons are now shown in Figure 1—figure supplement 4.

      There are other claims that still require a statistical justification to be included in addition to the passages on behavior mentioned above, e.g., "Increasing the orexin A concentration to 300 nM further increased [Ca2+]i" (L118). 

      Authors should ensure that all such claims are either presented with a statistical test or are phrased differently, e.g. "Visual inspection of data suggested that there was a further increase...". In addition, when an ANOVA is conducted, full results including main effects and interactions should be described. 

      We emphasize now our statement that ALREADY 100 nM orexin A significantly increased [Ca<sup>2+</sup>]i levels (lines 117 - 118).

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as main effects and interactions, are presented alongside the source data in the respective spreadsheets. For clarity, we chose to include only the key statistical information in the main text and figures. We thank the reviewer for pointing this out. In this revised manuscript, we have emphasized in each figure legend: ‘Source data and full statistics are provided in the supplementary materials’.

      Typos in figure captions  

      F2-S1 - spontanous 

      F3-S2 - intrest 

      We apologize for the typos. We have corrected them in this revised manuscript.

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05. 

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as test statistics, df and 95% confidence intervals, are presented alongside the source data in the respective spreadsheets. We thank the editor’s note. In this revised manuscript, we have included more statistical information in the main text and figure legends (see our response to reviewer #2). In the figure legends, we also explained that the full statistics were provided alongside the source data in the supplementary materials. In addition, we also uploaded the source data and full statistics in the bioRxiv before we upload this revised manuscript to eLife.

    1. eLife Assessment

      This valuable study suggests that the dosage compensation complex and m6A act in a feedback loop in Drosophila melanogaster. The study provides integrated analyses of RNA sequencing and mapping data of the m6A RNA modification in the context of unbalanced genomes, which suggests that m6A modification status may influence H3K16Ac deposition through regulation of the acetyltransferase MOF. However, it is not clear whether this regulation is directly or indirectly related to m6A regulation. The evidence is considered incomplete due to technical concerns, as quantitative assessments were made using non-quantitative methods.

    2. Reviewer #1 (Public review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. In my initial review I pointed out that MeRIP experiments are not quantitative and can be difficult to interpret when small changes are present. The data as presented still show only RPKM in IP samples, and the text alludes to changes in IP enrichment that are significant but the data do not appear to have been included in the figure. Concerns about the bulk-level m6A measurements also remain, as the new data showing m6A levels in mRNA show changes that are even smaller than those initially demonstrated in total RNA. Yet the data are still presented as significant, biologically relevant changes. The conclusions about mRNA m6A levels are not strengthened by measurements.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have tested effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop. Western blots confirm that Msl2 and MOF alleles alter levels of Mettl3 complex components, but the underlying mechanism remains undefined.

      Strengths:

      • Thorough bioinformatic analysis of their data<br /> • Incorporation of other published datasets that enhances scope and rigor<br /> • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males<br /> • Provides preliminary evidence that this counting mechanism may be due to DCC effects on expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A levels on specific sites remains unclear in this revision.<br /> • The paper relies on m6A comparisons across tissues and developmental stages, which introduces some uncertainty about where and when the DCC-m6A loop acts.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between the expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in-depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance, based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. MeRIP experiments are not quantitative, and since there are far fewer peaks in aneuploidies, it stands to reason that more antibody binding sites may be available to enrich those fewer peaks to a larger extent. But based on the data as presented (figure 2D) this conclusion was drawn from RPKM in IP samples, which may not fully account for changing transcript abundances in absolute (expression level changes) and relative (proportion of transcripts in input RNA sample) terms.

      Methylated RNA immunoprecipitation followed by sequencing (MeRIP-seq) is a commonly used strategy of genome-wide mapping of m6A modification. This method uses anti-m6A antibody to immunoprecipitate RNA fragments, which results in selective enrichment of methylated RNA. Then the RNA fragments were subjected to deep sequencing, and the regions enriched in the immunoprecipitate relative to input samples are identified as m6A peaks using the peak calling algorithm. We identified m6A peaks in different samples by the exomePeak2 program and determined common m6A peaks for each genotype based on the intersection of biological replicates. Figure 2D shows the RPM values of m6A peaks in MeRIP samples for each genotype, indicating that the levels of reads in the m6A peak regions were significantly higher in the aneuploid IP samples than in wildtypes. When the enrichment of IP samples relative to Input samples (RPM.IP/RPM.Input) was taken into account, the statistics for all three aneuploidies were still significantly higher than those of the wildtypes (Mann Whitney U test p-values < 0.001). This analysis is not about changes in the abundance of transcripts, but from the MeRIP perspective, showing that there are relatively more m6A-modified reads mapped to the m6A peaks in aneuploidies than that in wildtypes. We hope to provide a possible explanation for the phenomenon that the quantitative changes of m6A peaks are not consistent with the overall m6A abundance trend. We have added the results of IP/Input in the main text, and revised the description in the manuscript to make it more precise to reduce possible misunderstandings.

      The bulk-level m6A measurements as performed here also cannot effectively support these conclusions, as they are measured in total RNA. The focus of the work is mRNA m6A regulators, but m6A levels measured from total RNA samples will not reflect mRNA m6A levels as there are other abundance RNAs that contain m6A (including rRNA). As a result, conclusions about mRNA m6A levels from these measurements are not supported.

      According to published articles, m6A levels of mRNA or total RNA can be detected by different methods (such as mass spectrometry, 2D thin-layer chromatography, etc.) in Drosophila cells or tissues [1-3]. We used the EpiQuik m6A RNA Methylation Quantification Kit, which is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses. This kit has previously been used by researchers to detect the m6A/A ratio in total RNA [4, 5] or purified mRNA [6] from different species. Our pre-experiments showed that the enrichment of mRNA from total RNA did not appear to significantly affect the results of the detection of m6A levels.

      We extracted and purified mRNA from the heads of the control and MSL2 transgenic Drosophila to verify our conclusion. mRNA was isolated from total RNA using the Dynabeads mRNA purification kit (Invitrogen, Carlsbad, CA, USA, 61006). It was showing a heightened abundance of m6A modification on mRNA as opposed to total RNA (Figure 7E,F; Figure 7—figure supplement 1G,H). Compared with control Drosophila, the abundance changes of m6A in mRNA and total RNA in MSL2 transgenic Drosophila are basically the same. These results supported the conclusions in our manuscript. In the MSL2 knockdown Drosophila, the m6A modification levels on mRNA mirrored those observed on total RNA, exhibiting a significant downregulation (Figure 7E; Figure 7—figure supplement 1G). The only difference is that no substantial difference in the m6A abundance on mRNA was detected between MSL2 overexpressed female and the control Drosophila (Figure 7F; Figure 7—figure supplement 1H). It is suggested that m6A modification in other types of RNA other than mRNA (e.g., lncRNA, rRNA) is not necessarily meaningless, which is the future research direction. We will also add discussions of this issue in the manuscript.

      (1) Lence T, et al. (2016) m6A modulates neuronal functions and sex determination in Drosophila. Nature 540(7632):242-247.

      (2) Haussmann IU, et al. (2016) m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540(7632):301-304.

      (3) Kan L, et al. (2017) The m(6)A pathway facilitates sex determination in Drosophila. Nat Commun 8:15737.

      (4) Zhu C, et al. (2023) RNA Methylome Reveals the m(6)A-mediated Regulation of Flavor Metabolites in Tea Leaves under Solar-withering. Genomics Proteomics Bioinformatics 21(4):769-787.

      (5) Song H, et al. (2021) METTL3-mediated m(6)A RNA methylation promotes the anti-tumour immunity of natural killer cells. Nat Commun 12(1):5522.

      (6) Yin H, et al. (2021) RNA m6A methylation orchestrates cancer growth and metastasis via macrophage reprogramming. Nat Commun 12(1):1394.

      Reviewer #2 (Public Review):

      Summary:

      The authors have tested the effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites to become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between the activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and the expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop on each other. Overall, this paper uses bioinformatic trends to generate a candidate model of feedback between DCC and m6A. It would be improved by functional studies that validate the effect in vivo.

      Strengths:

      • Thorough bioinformatic analysis of their data.

      • Incorporation of other published datasets that enhance scope and rigor.

      • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males.

      • Suggests this counting mechanism may be due to the effect of chromatin-dependent effects on the expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A is indirect and based on bioinformatic trends with little follow-up to test the mechanistic bases of these trends.

      Western blots were performed to detect H4K16Ac in Ythdc1 knockdown Drosophila and control Drosophila. Through quantitative analysis, it is demonstrated that H4K16Ac levels changed significantly in Ythdc1 knockdown Drosophila. Combined with the results of polytene chromosome immunostaining in third instar larvae, we found that Ythdc1 affects the expression of H4K16Ac in tissue- and developmental stage-specific manners. This specificity may be associated with the onuniformity and heterogeneity of RNA m6A modification characteristics, encompassing the tissue specificity, the developmental specificity, the different numbers of m6A sites in one transcript, the different proportions of methylated transcripts, et cetera [1-3].

      In addition, we found a set of ChIP-seq data (GSE109901) of H4K16ac in female and male Drosophila larvae from the public database, and analyzed whether H4K16ac is directly associated with m6A regulator genes. ChIP-seq is a standard method to study transcription factor binding and histone modification by using efficient and specific antibodies for immunoprecipitation. The results showed that there were H4K16ac peaks at the 5' region in gene of m6A reader Ythdc1 in both males and females. In addition, most of the genome sites where the other m6A regulator genes located are acetylated at H4K16 in both sexes, except that Ime4 shows sexual dimorphism and only contains H4K16ac peak in females. These results indicate that the m6A regulator gene itself is acetylated at H4K16, so there is a direct relationship between H4K16ac and m6A regulators. We have added these contents to the text.

      Our analysis of experimental outcomes and public sequencing data has shed light on the interaction of the m6A reader protein Ythdc1 with H4K16Ac. We appreciate your interest in the complex interplay between H4K16Ac and m6A modifications. We acknowledge the intricacy of this interaction and concur that it merits further investigation, potentially supported by additional experiments.

      In current submitted manuscript, it is mainly focused on the role of RNA m6A modification in genomes experiencing imbalance, and we are going to explore this complex interplay in subsequent work for sure.

      (1) Meyer, K. D., et al. (2012). Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell, 149(7), 1635-1646.

      (2) Meyer, K. D., & Jaffrey, S. R. (2014). The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews: Molecular Cell Biology, 15(5), 313-326.

      (3) Zaccara, S., Ries, R. J., & Jaffrey, S. R. (2019). Reading, writing and erasing mRNA methylation. Nature Reviews: Molecular Cell Biology, 20(10), 608-624.

      • The paper lacks sufficient in vivo validation of the effects of DCC alleles on m6A and vice versa. For example, Is the Ythdc1 genomic locus a direct target of the DCC component Msl-2 ? (see Figure 7).

      In order to study whether Ythdc1 genomic locus is a direct target of DCC component, we first analyzed a published MSL2 ChIP-seq data of Drosophila (GSE58768). Since MSL2 is only expressed in males under normal conditions, this set of data is from male Drosophila. According to the results, the majority (99.1%) of MSL2 peaks are located on the X chromosome, while the MSL2 peaks on other chromosomes are few. This is consistent with the fact that MSL2 is enriched on the X chromosome in male Drosophila [1, 2]. Ythdc1 gene is located on chromosome 3L, and there is no MSL2 peak near it. Similarly, other m6A regulator genes are not X-linked, and there is no MSL2 peak. Then we analyzed the MOF ChIP-seq data (GSE58768) of male Drosophila. It was found that 61.6% of MOF peaks were located on the X chromosome, which was also expected [3, 4]. Although there are more MOF peaks on autosomes than MSL2 peaks, MOF peaks are absent on m6A regulator genes on autosomes. Therefore, at present, there is no evidence that the gene locus of m6A regulators are the direct targets of DCC component MSL2 and MOF, which may be due to the fact that most MSL2 and MOF are tethered to the X chromosome by MSL complex under physiological conditions. Whether there are other direct or indirect interactions between Ythdc1 and MSL2 is an issue worthy of further study in the future.

      (1) Bashaw GJ & Baker BS (1995) The msl-2 dosage compensation gene of Drosophila encodes a putative DNA-binding protein whose expression is sex specifically regulated by Sex-lethal. Development 121(10):3245-3258.

      (2) Kelley RL, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81(6):867-877.

      (3) Kind J, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133(5):813-828.

      (4) Conrad T, et al. (2012) The MOF chromobarrel domain controls genome-wide H4K16 acetylation and spreading of the MSL complex. Dev Cell 22(3):610-624.

      Quite a bit of technical detail is omitted from the main text, making it difficult for the reader to interpret outcomes.

      (1) Please add the tissues to the labels in Figure 1D.

      Figure 1D shows the subcellular localization of FISH probe signals in Drosophila embryos. Arrowheads indicate the foci of probe signals. The corresponding tissue types are (1) blastoderm nuclei; (2) yolk plasm and pole cells; (3) brain and midgut; (4) salivary gland and midgut; (5) blastoderm nuclei and yolk cortex; (6) blastoderm nuclei and pole cells; (7) blastoderm nuclei and yolk cortex; (8) germ band. We have added these to the manuscript.

      (2) In the main text, please provide detail on the source tissues used for meRIP; was it whole larvae? adult heads? Most published datasets are from S2 cells or adult heads and comparing m6A across tissues and developmental stages could introduce quite a bit of variability, even in wt samples. This issue seems to be what the authors discuss in lines 197-199.

      In this article, the material used to perform MeRIP-seq was the whole third instar larvae. Because trisomy 2L and metafemale Drosophila died before developing into adults, it was not possible to use the heads of adults for MeRIP-seq detection of aneuploidy. For other experiments described here, the m6A abundance was measured using whole larvae or adult heads; material used for RT-qPCR analysis was whole larvae, larval brains, or adult heads; Drosophila embryos at different developmental stages were used for fluorescence in situ hybridization (FISH) experiments. We provide a detailed description of the experimental material for each assay in the manuscript.

      (3) In the main text, please identify the technique used to measure "total m6A/A" in Fig 2A. I assume it is mass spec.

      We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the m6A/A ratio in RNA samples. This kit is commercially available for quantification of m6A RNA methylation, which used colorimetric assay with easy-to-follow steps for convenience and speed, and is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses.

      (4) Line 190-191: the text describes annotating m6A sites by "nearest gene" which is confusing. The sites are mapped in RNAs, so the authors must unambiguously know the identity of the gene/transcript, right?

      When the m6A peaks were annotated using the R package ChIPseeker, it will include two items: "genomic annotation" and "nearest gene annotation". "Genomic annotation" tells us which genomic features the peak is annotated to, such as 5’UTR, 3’UTR, exon, etc. "Nearest gene annotation" indicates which specific gene/transcript the peak is matched to. We modified the description in the main text to make it easier to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While I believe this study aims to address a very interesting question and demonstrates intriguing evidence suggesting a role for m6A in unbalanced genomes, technical limitations in the methods being used limited my confidence in the overall conclusions. In addition, some of the analyses seemed to distract a bit from the main question of the work, which made thoroughly reading and reviewing the work challenging at times due to the length and lack of cohesion. Some specific points and suggestions are detailed below.

      (1) Some specific points/recommendations for the bulk m6A measurements: for Figure 2A, the authors refer to m6A/A ratio in the text, but based on the methods section and axis labels in Figure 2A (as well as other figures), it may represent m6A% in total RNA. The authors should just clarify which one it is and make the text and figures consistent. The methods description also seems to specify that m6A is quantified in total RNA, and yet the factors being discussed (Ime4, Ythdc1, etc) are associated with m6A in mRNA. Since m6A is present in non-mRNAs (including highly abundant rRNAs), m6A analysis of total RNA may be masking some of the effects due to the relatively low abundance of mRNA relative to rRNA. It is possible that the above point contributes to the discrepancy between the overall m6A abundance in aneuploidies and the changing methylase expression levels (which does seem to correlate better with m6A sequencing data). On a related note, though the authors suggest in Figures 7E and F that m6A level changes are different in males and females, the levels and trends of m6A% in these panels seem quite similar, and the absence of the presence of statistical significance seems driven by higher variation (larger error bars) in the measurements in 7F (and again effects may be masked if total RNA is being quantified). This may be a very addressable issue, as m6A analysis of mRNA-enriched samples should be feasible, and in fact, may show clearer changes to better support the authors' conclusions.

      Thank you for your helpful comments.

      As suggested, the abundance of m6A on mRNA were detected (Figure 7E, F). Total RNA was extracted from the heads of the control and MSL2 transgenic Drosophila and mRNA was isolated using the Dynabeads mRNA purification kit (Invitrogen, Carlsbad, CA, USA, 61006). 300-600 ng mRNA can be purified from 40 μg total RNA (200-300 heads per sample). We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the abundance of m6A in mRNA samples (200ng). The results obtained by this method represent the m6A/A ratio (%), which is also written as m6A% on the user guide of the kit. We made corresponding revisions in the main text and figures to made them consistent.

      It is showing a heightened abundance of m6A modification on mRNA as opposed to total RNA including some other types of RNA such as mRNA, lncRNA, and rRNA (Figure 7E,F; Figure 7—figure supplement 1G,H). Consistently, in the MSL2 knockdown Drosophila, the m6A modification levels on mRNA mirrored those observed on total RNA, exhibiting a significant downregulation (Figure 7E; Figure 7—figure supplement 1G). In contrast, no substantial difference in the m6A abundance on mRNA was detected between MSL2 overexpressed Drosophila and the control Drosophila (Figure 7F; Figure 7—figure supplement 1H). The differences of m6A abundance between males and females were not statistically significant (Figure 7E,F), prompting us to make revisions to the manuscript.

      (2) The analyses in Figures 5 and 6 describe a lot of different comparisons derived from these datasets, and while there seem to be many interesting new hypotheses to be tested, the authors do not make any definitive conclusions from these analyses. These figures also seem to diverge a bit from the main conclusion of the work, and from this reviewer's perspective made it more difficult to read and review the work. Overall streamlining the narrative may help readers appreciate the main conclusions of the work (though this is of course up to the author's discretion).

      As indicated in Figure 5, the results demonstrated a sexually dimorphic role of m6A modification in the regulation of gene expression in aneuploid Drosophila, suggesting its potential involvement in the gene regulatory network through interactions with dosage-sensitive regulators. Furthermore, Figure 6 illustrated the intricate interplay between RNA m6A modification, gene expression, and alternative splicing under genomic imbalance, with RNA splicing being more intimately associated with m6A methylation than gene transcription itself.

      This manuscript also discussed the correlation between methylation status and classical dosage effects, dosage compensation effects, and inverse dosage effects. We have initially demonstrated that RNA m6A methylation could influence dosage-dependent gene regulation via multiple avenues, such as interactions with dosage-sensitive modifiers, alternative splicing mechanisms, the MSL complex, and other related processes. Indeed, our study primarily utilizes m6A methylated RNA immunoprecipitation sequencing (MeRIP-Seq) to comprehensively investigate the role of RNA m6A modification in genomes experiencing imbalance. We agree that more specific and in-depth research on these factors will be instrumental in elucidating the precise mechanisms by which m6A modification regulates expression in unbalanced genomes, which we acknowledge as a significant avenue for our future research.

      We are grateful for your suggestions and, should it be necessary, we might to simplify the volume of the whole manuscript by removing or condensing the data analyse and description to enhance the prominence of the central theme.

      Reviewer #2 (Recommendations For The Authors):

      Overall, please provide enough technical detail in the main text so that the reader understands what was done, and does not have to repeatedly dig into figure legends and materials and methods to understand each data statement.

      Thank you for your suggestions. We have added some technical details to the manuscript and made some modifications as suggested.

    1. eLife Assessment

      This work presents important findings of a modulatory effect of yohimbine, an alpha2-adrenergic antagonist that raises noradrenaline levels, on the reconsolidation of emotionally neutral word-picture pairs, depending on the hippocampal and cortical reactivation during retrieval. The evidence supporting the main conclusions is convincing, with an elegant design combining fMRI and psychopharmacology. The work will be of broad interest to researchers working on memory.

    2. Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and was it paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimibine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Fig. 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript:

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      This weakness was satisfactorily addressed in one revision round. As RT data are often not normally distributed, were they transformed prior to entry into linear models?

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

      Minor comments<br /> - Related to Major issue 2. In the introduction, it would be helpful to be specific about the type of memory being probed in the different studies referenced (episodic vs conditioning). For the former, please make it clear whether stimuli to be remembered were emotional or neutral, and for which stimulus class drug effects were observed. This is particularly important given that in the first paragraph you describe memory reactivation in the context of traumatic memories via mention of PTSD. It would also be helpful to know to which species you refer. For example, in line 115, "timing of drug administration..." a rodent and a human study are cited.

      This weakness was addressed in one revision round, resulting in an excellent introduction, highlighting the importance of studying post-retrieval effects for memory researchers and healthcare workers.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths for this well-written manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting-state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and whether it was paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimbine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology is innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in the timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have the drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Figure 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      We thank the reviewer for considering aspects of our work impressive, the study to be well-designed, and the methodology to be innovative and sound.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript.

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the Yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      Thank you for the opportunity to clarify these important issues.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose of differentiating between particularly strong memory evidence (e.g., in associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement (Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer 1’s comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58-60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      With respect to behavioral data reporting, we agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as fast vs. slow reaction time trials during Day 2 memory cueing. We conducted this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times. This approach also results in an equal number of trials in the low and high confidence conditions.”

      We completely agree that the relevant post-hoc test should be corrected for multiple comparisons. Please note that all reported post-hoc tests had been Bonferroni-corrected already. We clarify this now by explicitly referring to corrected p-values (P<sub>corr</sub>) and indicate in the methods that P<sub>corr</sub> refers to Bonferroni-corrected p-values. (please see page 25, lines 1036 to 1038).

      We further agree that for a comprehensive overview of the behaviour in terms of memory performance and RTs, these data need to be provided for each group and experimental day. Therefore, we now extended Supplementary Table S1 to include descriptive indices of memory performance (hits, dprime) and RTs for each group for each day. Moreover, we now report ANOVAs for reaction times for each of the experimental days in the main text.

      The ANOVA for Day 1 is now reported on page 6, lines 200 to 204: “To test for potential group differences in reaction times for correctly remembered associations on Day 1, we fit a linear model including the factors Group and Cueing. Critically, we did not observe a significant Group x Cueing interaction, suggesting no RT difference between groups for later cued and not cued items (F(2,58) = 1.41, P = .258, η<sup>2</sup> = 0.01; Supplemental Table S1).”

      The ANOVA for Day 2 is now reported on page 7, lines 243 to 248: “To test for potential group differences in reaction times for correctly remembered associations on Day 2, we fit a linear model including the factors Group and Reaction time (slow/fast) following the subject specific median split. The model did not reveal any main effect or interaction including the factor Group (all Ps > .535; Supplemental Table S1), indicating that there was no RT difference between groups, nor between low and high RT trials in the groups.”

      The ANOVA for Day 3 is reported on page 13 lines 487 to 494: “To test for potential group differences in reaction times for correctly remembered associations on Day 3 we fit a linear model including the factors Group and Cueing. This model did not reveal any main effect or interaction including the factor Group (all Ps > .267), indicating that there was no average RT difference between groups. As expected we observed a main effect of the factor Cueing, indicating a significant difference of reaction times across groups between trials that were successfully cued and those not cued on Day 2 (F(2,58) = 153.07, P < .001, η<sup>2</sup> = 0.22; Supplemental Table S1).”

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

      We agree that our recent publication utilizing a very similar experimental design including three days is highly relevant in the context of the current study and we now refer to this recent study earlier in our manuscript. Please see page 3, lines 89 to 94:  

      “Recently, we showed a detrimental impact of post-retrieval stress on subsequent memory that was contingent upon reinstatement dynamics in the Hippocampus, VTC and PCC during memory reactivation26. While this study provided initial insights into the potential brain mechanisms involved in the effects of post-retrieval stress on subsequent memory, the underlying neuroendocrine mechanisms remained elusive.”

      Moreover, we explicitly state our hypothesis regarding the neural mechanism, with reference to our recent work, on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      Concerning the potential direction of the effects of post-retrieval cortisol and noradrenaline, the literature is indeed mixed with partially contradicting results, which made it, in our view, difficult to derive a clear hypothesis of potentially opposite effects of cortisol and yohimbine. We summarize the relevant evidence in the introduction on pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders(5,11,12). In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15). Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #1 (Recommendations for the authors):

      (1) Related to major issue 2 in the Public Review. In the introduction, it would be helpful to be specific about the type of memory being probed in the different studies referenced (episodic vs conditioning). For the former, please make it clear whether stimuli to be remembered were emotional or neutral, and for which stimulus class drug effects were observed. This is particularly important given that in the first paragraph, you describe memory reactivation in the context of traumatic memories via mention of PTSD. It would also be helpful to know to which species you refer. For example, in line 115, "timing of drug administration..." a rodent and a human study are cited.

      We completely agree that these aspects are important. We have therefore rewritten the corresponding paragraph in the introduction to clarify the type of memory probed, the emotionality of the stimuli and the species tested. Please see pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      (2) The Bos 2014 reference appears incorrect. I think you mean the Frontiers paper of the same year.

      Thank you for noticing this mistake, which has been corrected.

      (3) Line 734 "The study employed a fully crossed, placebo-controlled, double-blind, between-subjects design". What is a fully crossed design?

      A fully-crossed design refers to studies in which all possible combinations of multiple between-subjects factors are implemented. However, because the factor reactivation/cueing was manipulated within-subject in the present study and there is only one between-subjects factor (group/drug), “fully-crossed” may be misleading here. We removed it from the manuscript.

      (4) Supplemental Table S3. Are these ordered in terms of significance? A t- or Z-value for each cluster (either of the peak or a summed value) would be helpful.

      We agree that the ordering of the clusters was not clearly described. In the revised Supplemental Table S3, we have now added a column with the cluster-peak specific T-values and added an explanation in the table caption: “Depicted clusters are ordered by cluster-peak T-values.”

      (5) Please provide the requested memory performance and reaction time data, and relevant group comparisons.

      In response to general comment #1 above, we now provide all relevant accuracy and reaction time data for all groups and experimental days in the revised Supplemental Table S1. Moreover, we now report the relevant group comparisons in the main text on page 6, lines 200 to 204, on page 7, lines 243 to 248, and on page 13, lines 487 to 494.

      (6) Please rewrite the introduction with specific hypotheses, mention your recent results published in Science Advances, and attend to suggestions made in the first comment above.

      We have rewritten parts of the introduction to make the link to our recent publication clearer and to clarify the types of memories and species tested, as suggested by the reviewer (please see pages 3 to 4, lines 100 to 113). Moreover, we explicitly state our hypothesis regarding the neural mechanism on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      In terms of the direction of the potential cortisol and yohimbine effects, we have elaborated on the relevant literature, which in our view does not allow a clear prediction regarding the nature of the drug effects. We have made this explicit by stating that “… while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.” (please see page 4, lines 111 to 113). It would be, in our view, inappropriate to retrospectively add another, more specific “hypothesis”.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths of this well-written manuscript.

      Strengths:

      (1) The study is methodologically rigorous, employing a well-structured three-day experimental design that includes fMRI imaging, pharmacological interventions, and controlled memory tests.

      (2) The use of pharmacological agents (i.e., hydrocortisone and yohimbine) to manipulate glucocorticoid and noradrenergic activity is a significant strength.

      (3) The clear distinction between online and offline neural reactivation using MVPA and RSA approaches provides valuable insights into how memory dynamics are influenced by noradrenergic and glucocorticoid activity distinctly.

      We thank the reviewer for these very positive and encouraging remarks.

      Weaknesses:

      (1) One potential limitation is the reliance on distinct pharmacodynamics of hydrocortisone and yohimbine, which may complicate the interpretation of the results.

      We agree that the pharmacodynamics of hydrocortisone and yohimbine are different. However, we took these pharmacodynamics into account when designing the experiment and have made an effort to accurately track the indicators for noradrenergic arousal and glucocorticoids across the experiment. As shown in Figure 2, these indicators confirm that both drugs are active within the time window of approximately 40-90 minutes after reactivation. This time window corresponds to the proposed reconsolidation window, which is assumed to open around 10 minutes post-reactivation and to remain open for a few hours (approximately 90 minutes; Monfils & Holmes, 2018; Lee et al., 2017; Monfils et al., 2009).

      We have now acknowledged the distinct pharmacodynamics of hydrocortisone and yohimbine on page 21, lines 845 to 847: “We note that yohimbine and hydrocortisone follow distinct pharmacodynamics(104,105), yet selected the administration timing to ensure that both substances are active within the relevant post-retrieval time window.”

      In the results section, on page 11, lines 437 to 439, we further emphasize this differential dynamic: “Our data demonstrate that, despite the distinct pharmacodynamics of CORT and YOH, both substances are active within the time window that is critical for potential reconsolidation effects(3,4,43).”

      (2) Another point related above, individual differences in pharmacological responses, physiological and cortisol measures may contribute to memory recall on Day 3.

      The administered drugs elicit a pronounced adrenergic and glucocorticoid response, respectively. Specifically, the cortisol levels reached by 20mg of hydrocortisone correspond to those observed after a significant stressor exposure. Moreover, individual variation in stress system activation following drug intake tends to be less pronounced than in response to a natural stressor. Nevertheless, we fully agree that individual factors, such as metabolism or body weight, can influence the drug's action.

      We therefore re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      Although we acknowledge that our study may not have been sufficiently powered for an analysis of individual differences, these data suggest that our memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses. It is to be noted, however, that all participants of the respective groups showed a pronounced increase in cortisol concentrations (on average > 1000% in the CORT group) and autonomic arousal (on average > 10% in the YOH group), respectively. These increases appeared to be sufficient to drive the observed memory effects, irrespective of some individual variation in the magnitude of the response.

      (3) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement  Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      We agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as slow vs. fast reaction time trials during Day 2 memory cueing. We chose to conduct this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times and to retain an equal number of trials in the low and high confidence conditions.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders5,11,12. In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15), Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #2 (Recommendations for the authors):

      My comments and/or questions for the authors to improve this well-written manuscript.

      (1) This study identifies the modulatory role of the hippocampus and VTC in the effects of norepinephrine on subsequent memory. Are there functional interactions between these ROIs and other brain regions that could be wise to consider for a more comprehensive understanding of the underlying neural mechanisms?

      We agree that functional interactions of hippocampus and VTC and other regions that were active during Day 2 memory cueing are relevant for our understanding of the underlying mechanisms. We therefore now performed connectivity analyses using general psycho-physiological interaction analysis (gPPI; as implemented in SPM) and report the results of this analysis on page 16, lines 635 to 644, and added Supplemental Table S4 including gPPI statistics.

      “We conducted general psycho-physiological interaction analysis (gPPI) analyses on the Day 2 memory cueing task (remembered – forgotten), which revealed that successful cueing was accompanied by significant functional connectivity between the left hippocampus, VTC, PCC and MPFC (see Supplemental Table S4). However, using these connectivity estimates to predict Day 3 subsequent memory performance (dprime) via regression did not reveal any significant Group × Connectivity interactions, indicating that the pharmacological manipulation (i.e. noradrenergic stimulation) did not modulate subsequent memory based on functional connectivity during memory cueing (all P<sub>Corr</sub> > .228). The same pattern of results was observed when including single trial beta estimates from multiple ROIs during memory cueing to predict Day 3 memory (all interaction effects P<sub>Corr</sub> > .288).”

      (2) In theory, noradrenergic activity would have a profound impact on activity in widespread brain regions that are closely related to memory function. It would be interesting to know other possible effects beyond the hippocampus and VTC.

      We agree and included in our analysis additional ROIs beyond the HC and VTC; we now report these explorative results on page 16, lines 616 to 633:

      “Beyond hippocampal and VTC activity during memory cueing (Day 2), we exploratively reanalysed the GLMMs predicting Day 3 memory performance including the PCC, which was relevant during memory cueing in the current study and in our previous work(26).  Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the PCC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1). We also included the Medial Prefrontal Cortex (MPFC) to predict Day 3 memory performance, as the MPFC has been shown to be sensitive to noradrenergic modulation in previous work(75). Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the MPFC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1), which indicates that the MPFC was not modulated by either pharmacological intervention. Finally, we investigated memory cueing from all remaining ROIs that were significantly activated during the Day 2 memory cueing task (Day 2 whole-brain analysis; correct-incorrect; Supplemental Table S3). We again fit GLMMs predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing. Again, we did not observe any significant interaction effect any of the ROIs (all interaction P<sub>Corr</sub> > .060) and these results did not change when adding the factor Reaction time to the respective models (all  P<sub>Corr</sub> > .075).”

      (3) There are substantial individual differences in pharmacological responses, physiological and cortisol measures, as shown in Figure 3A&B. If such individual differences are taken into account, are there any potential effects on subsequent recall on Day 3 pertaining to the hydrocortisone group?

      In response to this comment (and the General comment #1 of this reviewer), we now re-analyzed the respective models including individual measures of baseline-to-peak cortisol and systolic blood pressure.

      We re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      (4) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement ( Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      Minor comments:

      (5) Please include the full names of key abbreviations in the figure legends, such as "ass.cat.hit" and among others.

      We now include the full names of key abbreviations in all figure legends (e.g., ass.cat.hit = associative category hit).

      (6) Please introduce various metrics used in the study to aid readers in better understanding the measurements they utilized.

      We agree that various measures that were included in our analyses had not been described clearly enough before, especially concerning the multivariate analyses. We therefore added short explanations across the results section.

      Page 8, lines 279 to 280: “Classifier accuracy is derived from the sum of correct predictions the trained classifier made in the test-set, relative to the total amount of predictions.”

      Page 8, lines 290 to 292:  “Neural reinstatement reflects the extent to which a neural activity pattern (i.e., for objects) that was present during encoding is reactivated during retrieval (e.g., memory cueing).”

      Page 8, lines 299 to 301:  “The logits here reflect the log-transformed trial-wise probability of a pattern either representing a scene or an object.”

      Page 10, lines 378 to 380:  “Beyond category-level reinstatement, we assessed event-level memory trace reinstatement from initial encoding (Day 1) to memory cueing (Day 2), via RSA, correlating neural patterns in each region (hippocampus, VTC, and PCC) across days.”

      (7) Please explain what the different colors represent in Figures 5B and 5C to avoid confusion. It would be good to indicate significant differences in the figures if applicable.

      We now added line legends to the figure and also the caption to clarify what exactly is depicted. We added asterisks to mark significant differences.

      References:

      Monfils, M. H., Cowansage, K. K., Klann, E., & LeDoux, J. E. (2009). Extinction-reconsolidation boundaries: key to persistent attenuation of fear memories. science324(5929), 951-955.

      Monfils, M. H., & Holmes, E. A. (2018). Memory boundaries: opening a window inspired by reconsolidation to treat anxiety, trauma-related, and addiction disorders. The Lancet Psychiatry5(12), 1032-1042.

      Lee, J. L. C., Nader, K. & Schiller, D. An Update on Memory Reconsolidation Updating. Trends Cogn. Sci. 21, 531–545 (2017).

      Radley, J. J., Williams, B., & Sawchenko, P. E. (2008). Noradrenergic innervation of the dorsal medial prefrontal cortex modulates hypothalamo-pituitary-adrenal responses to acute emotional stress. Journal of Neuroscience28(22), 5806-5816.

      Heinbockel, H., Wagner, A. D., & Schwabe, L. (2024). Post-retrieval stress impairs subsequent memory depending on hippocampal memory trace reinstatement during reactivation. Science Advances10(18), eadm7504.

    1. eLife Assessment

      This study provides compelling evidence for functional subpopulations of β-cells responsible for Ca2+ signal initiation and maintenance using novel three-dimensional light sheet microscopy imaging and analysis of pancreatic islets. The findings are important as they help decode mechanistic underpinnings of islet calcium oscillations and the resulting pulsatile insulin secretion. The work will be of general interest to cell biologists and particular interest to islet biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Jin, Briggs and colleagues use light sheet imaging to reconstruct the islet three-dimensional Ca2+ network. The authors find that early/late responding (leader) cells are dynamic over time, and located at the islet periphery. By contrast, highly connected or hub cells are stable, and located toward the islet center. Suggesting that the two subpopulations are differentially regulated by fuel input, glucokinase activation only influences leader cell phenotype, whereas hubs remain stable.

      Strengths:

      The studies are novel in providing the first three-dimensional snapshot of the beta cell functional network, as well as determining the localization of some of the different subpopulations identified to date. The studies also provide some consensus as to the origin, stability and role of such subpopulations in islet function.

      Weaknesses:

      Experiments with metabolic enzyme activators do not take into account the influence of cell viability on the observed Ca2+ network data. Limitations of the imaging approach used need to be recognised and evaluated/discussed.

      Comments on revisions:

      The authors have addressed the majority of the points raised.

    3. Reviewer #2 (Public review):

      The manuscript by Erli Jin and Jennifer Briggs et al. utilizes light sheet microscopy to image islet beta cell calcium oscillations in 3D and determine where beta cell populations are located that begin and coordinate glucose-stimulated calcium oscillations. The light sheet technique allowed clear 3D mapping of beta cell calcium responses to glucose, glucokinase activation, and pyruvate kinase activation. The manuscript finds that synchronized beta-cells are found at the islet center, that leader beta cells showing the first calcium responses are located on the islet periphery, that glucokinase activation helped maintain beta cells that lead calcium responses, and that pyruvate kinase activation primarily increases islet calcium oscillation frequency. The study is well-designed, contains a significant amount of high quality data, and the conclusions are largely supported by the results.

      Comments on revisions:

      The manuscript by Erli Jin et al. has been improved with the revisions, which have addressed my previous concerns. The manuscript significantly improves the mechanistic underpinnings of islet calcium oscillations and resulting pulsatile insulin secretion.

    4. Reviewer #3 (Public review):

      Summary:

      Jin, Briggs et al. made use of light-sheet 3D imaging and data analysis to assess the collective network activity in isolated mouse islets. The major advantage of using whole islet imaging, despite compromising on a speed of acquisition, is that it provides a complete description of the network, while 2D networks are only an approximation of the islet network. In static-incubation conditions, excluding the effects of perfusion, they assessed two subpopulations of beta cells and their spatial consistency and metabolic dependence.

      Strengths:

      The authors confirmed that coordinated Ca2+ oscillations are important for glycemic control. In addition, they definitively disproved the role of individual privileged cells, which were suggested to lead or coordinate Ca²⁺ oscillations. They provided evidence for differential regional stability, confirming the previously described stochastic nature of the beta cells that act as strongly connected hubs as well as beta cells in initiating regions (doi.org/10.1103/PhysRevLett.127.168101). This has not been a surprise to the reviewer.

      The fact that islet cores contain beta cells that are more active and more coordinated has also been readily observed in high-frequency 2D recordings (e.g. DOI: 10.2337/db22-0952), suggesting that the high-speed capture of fast activity can partially compensate for incomplete topological information.

      They also found an increased metabolic sensitivity of mantle regions of an islet with subpopulation of beta cells with a high probability of leading the islet activity and which can be entrained by fuel input. They discuss a potential role of alpha/delta cell interaction, however relative lack of beta cells in the islet border region could also be a factor contributing to less connectivity and higher excitability.

      The Methods section contains a useful series of direct instructions on how to approach fast 3D imaging with currently available hardware and software.

      The Discussion is clear and includes most of the issues regarding the interpretation of the presented results.

      Taken together it is a strong technical paper to demonstrate the stochasticity regarding the functions subpopulations of beta cells in the islets may have and how less well-resolved approaches (both missing spatial resolution as well as missing temporal resolution) led us to jump to unjustified conclusions regarding the fixed roles of individual beta cells within an islet.

      Weaknesses:

      There are a few relevant issues that need to be addressed.

      (1) The study is not internally consistent regarding the Results section. In the text the authors discuss changes in membrane potential (not been measured in this study), while in the figures they exclusively describe Ca2+ oscillations (which were measured). Examples are on lines 149, 150, 153, 154, 263... It is recommended that the silent and active phase in the Results section describe processes actually measured in this study as shown 6A.

      (2) There are in fact no radially oriented networks in the core of an islet (l. 130, Fig. 4) apart from the fact that every hub has somewhat radially oriented edges. For radiality to have some general meaning, the normalized distance from the geometric center would need to be lower than 0.4. The networks are centrally located, which does not change the major conclusions of the study.

      (3) The study would profit from acknowledging that Ca2+ influx is not a sole mechanism to drive insulin secretion and that KATP channels are not the sole target sensitive to changes in the cytosolic (global or local) ADP and ATP concentration or that there is an absolute concentration-dependence of these ligands on KATP channels. The relatively small conductance changes that have been found associated to active and silent phases (closing and opening of the KATP channels as interpreted by the authors, respectively, doi: 10.1152/ajpendo.00046.2013) and should be due to metabolic factors, could be also associated to desensitization of KATP channels to ATP due to the increase in cytosolic Ca2+ changes after intracellular Ca2+ flux (DOI: 10.1210/endo.143.2.8625) as they have been found to operate also at time scales, significantly faster (DOI: 10.2337/db22-0952) than reported before (refs. 21,22). Metabolic changes influence intracellular Ca2+ flux as well.

      (4) There is no explanation for why KL divergence is so different between the pre-test regional consistency of the islets used to test the vehicle compared to those where GKa and PKa have been tested.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Jin, Briggs, and colleagues use light sheet imaging to reconstruct the islet threedimensional Ca2+ network. The authors find that early/late responding (leader) cells are dynamic over time, and located at the islet periphery. By contrast, highly connected or hub cells are stable and located toward the islet center. Suggesting that the two subpopulations are differentially regulated by fuel input, glucokinase activation only influences leader cell phenotype, whereas hubs remain stable.

      Strengths:

      The studies are novel in providing the first three-dimensional snapshot of the beta cell functional network, as well as determining the localization of some of the different subpopulations identified to date. The studies also provide some consensus as to the origin, stability, and role of such subpopulations in islet function.

      We thank the reviewers for their positive assessment.

      Weaknesses:

      Experiments with metabolic enzyme activators do not take into account the influence of cell viability on the observed Ca2+ network data. Limitations of the imaging approach used need to be recognized and evaluated/discussed.

      We worked very hard to make sure the islets remained stable and healthy over the duration of imaging time course. We imaged the islet in 3D and observed that all betacells displayed glucose-dependent oscillations, which can only arise from functioning cells. From the raw calcium traces (displayed in the figures) we observed no detectable loss of signal over 60 min of continuous imaging regardless of drug treatment; this is because the laser excitation is below the bleach threshold for GCaMP6s, and it is bleaching that generates phototoxicity. To demonstrate this clearly, we performed a bleach test using 6x laser power; in this case calcium amplitude dropped 30% over a 60 min of imaging, however islet calcium oscillatory behavior was preserved. Light-sheet is well documented to be 1000x more gentle than other optical sectioning techniques, which is why it was chosen for this application.

      Regarding the limitations of imaging approach, we recognized studying islets ex vivo is necessarily performed in the absence of native surrounding tissue, as highlighted in the discussion.

      Reviewer #2 (Public Review):

      The manuscript by Erli Jin, Jennifer Briggs et al. utilizes light sheet microscopy to image islet beta cell calcium oscillations in 3D and determine where beta cell populations are located that begin and coordinate glucose-stimulated calcium oscillations. The light sheet technique allowed clear 3D mapping of beta cell calcium responses to glucose, glucokinase activation, and pyruvate kinase activation. The manuscript finds that synchronized beta-cells are found at the islet center, that leader beta cells showing the first calcium responses are located on the islet periphery, that glucokinase activation helped maintain beta cells that lead calcium responses, and that pyruvate kinase activation primarily increases islet calcium oscillation frequency. The study is well-designed, contains a significant amount of high-quality data, and the conclusions are largely supported by the results.

      It has recently been shown that beta cells within islets containing intact vasculature (such as those in a pancreatic slice) show different calcium responses compared to isolated islets (such as that shown in PMID: 35559734). It would be important to include some discussion about the potential in vitro artifacts in calcium that arise following islet isolation (this could be included in the discussion about the limitations of the study).

      Although isolated islets reproduce the slow oscillatory calcium behavior observed in vivo, we agree that missing elements such as blood flow, cholinergic innervation, and surrounding tissues may each impact islet calcium responses. Pancreatic regional blood flow also links the endocrine and exocrine signaling which can directly influence the behavior of beta cells. We have highlighted some of these issues in the discussion “In addition to α-cells, vasculature may also impact islet Ca2+ responses, and may induce additional heterogeneity in vivo.” (see line 375, Ref. 46).

      Reviewer #3 (Public Review):

      Summary:

      Jin, Briggs et al. made use of light-sheet 3D imaging and data analysis to assess the collective network activity in isolated mouse islets. The major advantage of using whole islet imaging, despite compromising on the speed of acquisition, is that it provides a complete description of the network, while 2D networks are only an approximation of the islet network. In static-incubation conditions, excluding the effects of perfusion, they assessed two subpopulations of beta cells and their spatial consistency and metabolic dependence.

      Strengths:

      The authors confirmed that coordinated Ca2+ oscillations are important for glycemic control. In addition, they definitively disproved the role of individual privileged cells, which were suggested to lead or coordinate Ca²⁺ oscillations. They provided evidence for differential regional stability, confirming the previously described stochastic nature of the beta cells that act as strongly connected hubs as well as beta cells in initiating regions (doi.org/10.1103/PhysRevLett.127.168101).

      The fact that islet cores contain beta cells that are more active and more coordinated has also been readily observed in high-frequency 2D recordings (e.g. DOI: 10.2337/db22-0952), suggesting that the high-speed capture of fast activity can partially compensate for incomplete topological information.

      They also found an increased metabolic sensitivity of mantle regions of an islet with a subpopulation of beta cells with a high probability of leading the islet activity which can be entrained by fuel input. They discuss a potential role of alpha/delta cell interaction, however relative lack of beta cells in the islet border region could also be a factor contributing to less connectivity and higher excitability.

      The Methods section contains a useful series of direct instructions on how to approach fast 3D imaging with currently available hardware and software.

      The Discussion is clear and includes most of the issues regarding the interpretation of the presented results.

      Some issues concerning inconsistencies between data presented and statements made as well as statistical analysis need to be addressed.

      Taken together it is a strong technical paper to demonstrate the stochasticity regarding the functions subpopulations of beta cells in the islets may have and how less well-resolved approaches (both missing spatial resolution as well as missing temporal resolution) led us to jump to unjustified conclusions regarding the fixed roles of individual beta cells within an islet.

      We thank the reviewers for the comments on the many strengths of the manuscript and address the specific critiques below.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Essential revisions:

      (1) How useful is GK activation as a subpopulation-level perturbation, given that all beta cells would be affected? Previous studies by the authors have shown that GK gradients likely dictate subpopulation behaviour, so the concern here is that GK activation across all cells might mask the influence of such gradients i.e. a U-shaped effect. Also, does the GK activator differentially penetrate the islet such that first responders/leaders are more vulnerable than hubs?

      As we previously published, non-saturating concentrations of GK activator (as used here) have the same effect on calcium oscillations as raising glucose (PMID:33147484). In other words, the activator boosts the activity of the endogenous GK. To the second point, recent ex vivo islet studies (PMID: 28380380) document the islet penetration of a fluorescent glucose analogue within seconds even under static conditions, and in our study the islets calcium oscillations reached steady state, so we are not concerned about drug penetration. The real limitation with any drug study in the islet is that non-beta cells are also activated; this limitation is included in the discussion along with the recommendation that genetic tools are needed to assess the effect of GK activation in the various endocrine subpopulations. 

      An additional concern with the GK activation experiment is that GK activation might push beta cells into a more stressed state such that they are more susceptible to phototoxicity. Although the authors state that photobleaching is low, they provide no data to support such a statement. Given the long duration of imaging and acquisition rate, phototoxicity might be more of an issue, especially with GK activation. Some further analysis (e.g. apoptosis) would be useful here to exclude an effect of beta cell viability versus GK activation on the observed phenotype of the different subpopulations.

      Acute GK activation (for 30min) does not stress the islet; the drug has the same effect as raising glucose (PMID: 33147484). To determine whether photobleaching was impacted by GK activation, we examined the peak of consecutive oscillations in response to vehicle and GK activator. The average photobleaching was less than 2% of the calcium fluorescence over 30min of continuous imaging. Furthermore, GKa activation did not significantly increase photobleaching (see Author response image 1). 

      Author response image 1.

      To the reviewer’s second point, apoptosis cannot occur on the timescale of the drug treatment (30min), and raw calcium traces are included showing that all beta cells display oscillatory behavior throughout the course of the experiment.

      (2) The authors show that glucokinase activation increases the duration of islet calcium oscillations and in some islets (3 of 15 islets) causes "a Ca2+ plateau." The authors indicate that "Glucokinase, as the 'glucose sensor' for the β-cell, controls the input of glucose carbons into glycolysis, and opens KATP channels." It would be nice to have some experimental evidence that the change in oscillation rate caused by the glucokinase activator is due to KATP activation. This could be accomplished by treating islets with subthreshold KATP activators (e.g., diazoxide) or subthreshold KATP inhibitors (e.g., tolbutamide).

      The statement that glucokinase activation opens KATP channels was a typo; glucose metabolism closes KATP channels by raising the ATP/ADP ratio. We now include additional citations that document the relationship between GK and KATP and the oscillatory behavior. See Ref 22 (PMID: 33147484) and Ref 34 (PMID: 33147484).

      The manuscript finds that "Early phase cells were maintained to a greater degree upon GKa application." Yet GKa is proposed to activate KATP. Some discussion about how the early phase is maintained in cell populations by GKa activation in the context of KATP activity would be useful.

      As discussed above, we meant to say that GKa will close KATP and apologize for the confusion. As we mentioned in the discussion, early phase cells are most likely maintained to a great degree following GK activation as result of enhanced GK gradient and reduced effect of stochastic alpha cell input. 

      (3) Membrane potential depolarization precedes calcium channel activation and subsequent calcium entry. In many cases, electrical coupling across beta cells happens on millisecond timescale. It would be good to confirm that the calcium is showing the same time scale in terms of elevation following beta cell membrane potential depolarization. One concern is that the islet beta cells could be depolarizing at the same speed and lagging in terms of calcium channel activation and calcium entry.

      We thank the reviewer for making this point, which is almost certainly true, particularly since plasma membrane calcium influx is not the sole source of intracellular calcium. Previously published “simultaneous” recordings of Vm and calcium show their same phase relationship but do not have sufficient time resolution to capture depolarization of each cell. A quantification of phase lag would require the field to generate mice with voltage sensors expressed in beta cells; these tools are not yet available.  

      A related issue: in the text, the authors discuss changes in membrane potential (not been measured in this study), while in the figures they exclusively describe Ca2+ oscillations (which were measured). Examples are on lines 149, 150, 153, 154, 263. It is recommended that the silent and active phases in the Results section describe processes actually measured in this study as shown in 6A.

      To clarify, we did not use the term ‘membrane potential’ anywhere in the manuscript. We do sometimes refer to calcium influx as a proxy for membrane depolarization; we think this is valid given the abundant evidence that these processes are interdependent in beta cells.

      (4) It would be good to include the timing of the phases of calcium entry. When was the beta cell calcium entry monitored for the response time? Were the response times between the late and early phases consistent for each oscillation? It looks as if the start of the calcium upstroke was similar for many beta cells (such as for the Figure 2I traces). It would be nice to include a shorter time duration graph of calcium oscillation traces right when the upstroke starts. This would allow the community to observe the differences in the start time of calcium entry. 

      We agree this is an important point. We now include an inset showing the expanded time scale of the calcium upstroke in Fig.2I. The response time spread between early and late phase cells is now shown in Fig.7F (and in Author response image 2). We also quantified the coefficient of variation in the response time spread (0 = no variation and 1 = maximal variation) and found no significant differences between metabolic activators (Author response image 2). 

      Author response image 2.

      Also, for most of the GCaMP6s traces shown, the authors indicate that they are plotted as F/F0. However, this normalization (F/F0) is not done for the actual traces shown. For example, Figure 2D shows the traces starting from what looks to be 0 to 0.3 F/F0, but the traces for an F/F0 group should all start at 1. Please change this for all representative oscillations so the start of calcium entry for example traces all line up.

      This has been corrected in Fig. 2D, I and Fig. 3B. Also Fig.6 should be F not F/F0

      Reviewer #1 (Recommendations for the authors):

      (1) Line 53: "Silencing the electrical activity of these hub cells with optogenetics was found to abolish the coordination within that plane of the islet". The authors should acknowledge that studies also showed that beta cell transcription factor (Pdx1/Mafa) dosage was important for hub cell phenotype and islet function.

      Thank you, this reference to Nasteska et al. (PMID: 33514698, Ref. 16) has been added to the discussion.

      (2) Light sheet imaging is used to image the 3D islet volume. Whilst speed is undoubtedly an advantage of this technique, axial resolution is ~1.1 µm over 4 µm z-step size. How confident are the authors that single nuclei can be reliably identified given their ~6 µm size in a beta cell (e.g. do some elongated nuclear appear, which could be "doublets")?

      The axial resolution of 1.1 µm exceeds the resolution needed for the Nyquist criterion (i.e. sampling every 2-3 µm). As a practical matter, it is not possible to doublecount nuclei because the software will exclude nuclei that occupy the same volume. Only a very elongated nucleus (>10 µm) would be double counted and this does not occur.

      (3) The authors discuss the advantages of the light sheet imaging approach used, including speed and phototoxicity. Some more balance is needed here since other approaches such as two-photon excitation achieve similar speeds with much better axial resolution (see dozens of neural circuit studies).

      We are careful to point out that two-photon excitation has better axial resolution, better tissue penetration, and often higher speeds (kHz using linescans) – however these neuronal studies are limited to the cells in a few planes and the laser power is orders of magnitude higher than lightsheet. For this reason, two photon imaging has not been used to image islet calcium in three dimensions. The bottom line is lightsheet trades axial resolution for gentle volumetric imaging. 

      (4) Line 340: "Laser ablation or optogenetic inactivation of these early phase cells would be predicted to have little impact on islet function, as suggested previously by electrophysiological studies in which surface β-cells have been voltage-clamped with no impact on β-cell oscillations". This statement is slightly ambiguous since the authors showed in their previous studies that laser ablation of first responder cells/leaders was able to influence the Ca2+ network. Do the authors mean that laser ablation would only temporarily influence islet function before another cell picked up the role of a first responder/leader? As written, the sentence seems to imply that first responders/leaders are unimportant for the islet function.

      We intended to imply that the oscillatory system is sufficiently robust that a new cell take over when leader cells are ablated. We also cite Korosak et al. (PMID:34723613, Ref. 40) and Dwulet et al. (PMID: 33939712, Ref. 15) to make this point, although to clarify we are not examining first responders in this study.

      (5) Line 369: "In contrast with leader cells, we found that the highly synchronized cells are both spatially and temporally stable." The sentence needs qualifying- what would spatiotemporal stability be expected to confer on such a subpopulation?

      We believe that the spatiotemporal stability of highly synchronized cells is a consequence of beta cells in the center of the islet lacking the stochastic input of nearby alpha cells; we raise this point in the discussion: “The preponderance of α-cells on the periphery of mouse islets, which influence β-cell oscillation frequency, would be expected to disrupt β-cell synchronization on the periphery and stabilize it in the islet center – which is precisely the pattern of network activity we observed.” (see line 372). 

      (6) Line 370: "However, in conflict with the description of hub cells as intermingled with other cells throughout the islet, the location of such cells in 3D space is close to the center." The study by Johnston et al did not have the axial resolution to exclude that some cells might have been grouped together.

      We agree and have included the reviewer’s comment in the text (See line 384); that’s an important reason for conducting this 3D study.  

      (7) Line 380: "One explanation may be that paracrine communication within the islet determines which region of cells will show high or low degree. For example, more peripheral cells that are in contact with nearby δ-cells may show some suppression in their Ca2+ dynamics, and thus reduced synchronization." A potentially exciting future study. Should however probably cite DOI s41467-022-31373-6 here.

      We thank the reviewer for their input. This reference to Ren et al. (PMID:35764654) was previously included as Ref. 42 (now Ref. 45)

      Reviewer #3 (Recommendations for the authors):

      (1) There are in fact no radially oriented networks in the core of an islet (l. 130, Figure 4) apart from the fact that every hub has somewhat radially oriented edges. For radiality to have some general meaning, the normalized distance from the geometric center would need to be lower than 0.4. The networks are centrally located, which does not change the major conclusions of the study.

      Thank you for pointing out this imprecise language. We did not intend to imply that the functional network is orientated radially. We corrected the text (see line 131, 145) to indicate that the cells with high and low synchronization are distributed in a radial pattern. 

      (2) The study would benefit from acknowledging that Ca2+ influx is not a sole mechanism to drive insulin secretion and that KATP channels are not the sole target sensitive to changes in the cytosolic (global or local) ADP and ATP concentration or that there is an absolute concentration-dependence of these ligands on KATP channels. The relatively small conductance changes that have been found to be associated with active and silent phases (closing and opening of the KATP channels as interpreted by the authors, respectively, doi: 10.1152/ajpendo.00046.2013) and should be due to metabolic factors, could be also associated to desensitization of KATP channels to ATP due to the increase in cytosolic Ca2+ changes after intracellular Ca2+ flux (DOI: 10.1210/endo.143.2.8625) as they have been found to operate also at time scales, significantly faster (DOI: 10.2337/db22-0952) than reported before (refs. 21,22). Metabolic changes influence intracellular Ca2+ flux as well.

      The reviewer is absolutely correct that there are amplifying factors and other sources of calcium beyond plasma membrane influx and there are other mechanisms that regulate insulin secretion beyond calcium levels. These alternative mechanisms are introduced in Refs. 1-2, however they are not the focus of this study. 

      (3) There is no explanation for why KL divergence is so different between the pre-test regional consistency of the islets used to test the vehicle compared to those where GKa and PKa have been tested.

      We thank the reviewer for their careful observation. This arises because there are larger differences between preparations than within a preparation. This has been described previously (PMID: 16306370 and 20037650) and could be expected to account for the differences in KL divergence between animals. 

      (4) Statistical analysis would profit from testing the normality of the data distribution before choosing the statistical test and then learning the difference between parametric and nonparametric tests. For example, in Figures 3CD and 5EF, the data density is lower at the calculated mean than below and above this value and there are other examples in other figures too.

      We thank the reviewer for this very important comment, and we apologize for the oversight on our part. To address this comment, we conducted two normality tests: Anderson-Darling and Kolmogorov-Smirnov on all statistical analyses in the manuscript. If the data were not normally distributed, we changed the analysis to Wilcoxon matchedpairs signed rank test (non-parametric version of t-tests) or the Friedman test (nonparametric version of ANOVA). Three results were changed based on this statistical correction: Figure 4D, also 5F 3D (from P=0.01 to P=0.0526), Figure 5F  ¼ z-depth (P = 0.005 to P = 0.012). We have updated the manuscript methods, results, and figures accordingly. Importantly, these results did not change the main points of the paper.

    1. eLife Assessment

      This important work presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. The evidence supporting the claims is generally convincing but could be improved if the authors expanded the phenotypic characterization of the compound and its potential effects on both viral and host targets.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently, the authors present their findings of the efficacy of the inhibitor both on cell culture, as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings.

      Weaknesses:

      (a) The selection of Targ1 and MacroD2 as off-target human macrodomains is poor as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity.

      (b) The authors utilize only replication efficiency and general infection markers as read out for their Mac1 inhibitor. It would be good if they could show impact on the ADP-ribosylation of a known Mac1 target such as PARP14.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have a reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

    5. Author response:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for their thoughtful assessment. We especially agree that AVI-4206 will be a valuable tool to help understand the host immune response to viral infection.

      Reviewer #2 (Public review):

      We thank Reviewer #2 for their comments and will address PARP9/14 selectivity with in vitro experiments and alignments/modeling. For ADP-ribosylation of PARP14, we will attempt experiments patterned after Kar et al, EMBO Journal, 2024, but note that detection of ADPr by IF and western has been relatively inconsistent and detection-reagent dependent in our hands. Regardless of the outcome, we will expand the discussion of the prior literature on this point.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their comments, especially noting that we had the “chutzpah” to go for the in vivo experiment. We share the concern about potential off target effects, which is why we prioritized so many selectivity experiments prior to testing. Ongoing chemistry efforts are focused on developing next-generation inhibitors that are orally bioavailable, but this work is in its early stages.

    1. eLife Assessment

      This important work substantially advances our understanding of episodic memory by proposing a biologically plausible mechanism through which hippocampal barcode activity enables efficient memory binding and flexible recall. The evidence supporting the conclusions is convincing, with rigorously validated computational models and alignment with experimental findings. The work will be of broad interest to neuroscientists and computational modelers studying memory and hippocampal function.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors develop a biologically plausible recurrent neural network model to explain how the hippocampus generates and uses barcode-like activity to support episodic memory. They address key questions raised by recent experimental findings: how barcodes are generated, how they interact with memory content (such as place and seed-related activity), and how the hippocampus balances memory specificity with flexible recall. The authors demonstrate that chaotic dynamics in a recurrent neural network can produce barcodes that reduce memory interference, complement place tuning, and enable context-dependent memory retrieval, while aligning their model with observed hippocampal activity during caching and retrieval in chickadees.

      Strengths:

      (1) The manuscript is well-written and structured.<br /> (2) The paper provides a detailed and biologically plausible mechanism for generating and utilizing barcode activity through chaotic dynamics in a recurrent neural network. This mechanism effectively explains how barcodes reduce memory interference, complement place tuning, and enable flexible, context-dependent recall.<br /> (3) The authors successfully reproduce key experimental findings on hippocampal barcode activity from chickadee studies, including the distinct correlations observed during caching, retrieval, and visits.<br /> (4) Overall, the study addresses a somewhat puzzling question about how memory indices and content signals coexist and interact in the same hippocampal population. By proposing a unified model, it provides significant conceptual clarity.

      Weaknesses:

      The recurrent neural network model incorporates assumptions and mechanisms, such as the modulation of recurrent input strength, whose biological underpinnings remain unclear. The authors acknowledge some of these limitations thoughtfully, offering plausible mechanisms and discussing their implications in depth.

      One thread of questions that authors may want to further explore is related to the chaotic nature of activity that generates barcodes when recurrence is strong. Chaos inherently implies sensitivity to initial conditions and noise, which raises questions about its reliability as a mechanism for producing robust and repeatable barcode signals. How sensitive are the results to noise in both the dynamics and the input signals? Does this sensitivity affect the stability of the generated barcodes and place fields, potentially disrupting their functional roles? Moreover, does the implemented plasticity mitigate some of this chaos, or might it amplify it under certain conditions? Clarifying these aspects could strengthen the argument for the robustness of the proposed mechanism.

      It may also be worth exploring the robustness of the results to certain modeling assumptions. For instance, the choice to run the network for a fixed amount of time and then use the activity at the end for plasticity could be relaxed.

    3. Reviewer #2 (Public review):

      Summary:

      Striking experimental results by Chettih et al 2024 have identified high-dimensional, sparse patterns of activity in the chickadee hippocampus when birds store or retrieve food at a given site. These barcode-like patterns were interpreted as "indexes" allowing the birds to retrieve from memory the locations of stored food.<br /> The present manuscript proposes a recurrent network model that generates such barcode activity and uses it to form attractor-like memories that bind information about location and food. The manuscript then examines the computational role of barcode activity in the model by simulating two behavioral tasks, and by comparing the model with an alternate model in which barcode activity is ablated.

      Strengths of the study:

      - Proposes a potential neural implementation for the indexing theory of episodic memory<br /> - Provides a mechanistic model of striking experimental findings: barcode-like, sparse patterns of activity when birds store a grain at a specific location<br /> - A particularly interesting aspect of the model is that it proposes a mechanism for binding discrete events to a continuous spatial map, and demonstrates the computational advantages of this mechanism

      Weaknesses:

      - The relation between the model and experimentally recorded activity needs some clarification<br /> - The relation with indexing theory could be made more clear<br /> - The importance of different modeling ingredients and dynamical mechanisms could be made more clear<br /> - The paper would be strengthened by focusing on the most essential aspects

    4. Author response:

      We thank the reviewers for the thoughtful comments, and we hope to address these issues in a future revision. We will clarify that chaos only serves to generate barcodes, and show that once they are formed and assigned the memory mechanism is stable to initial conditions.  We will also clarify the model's assumptions and its connections to indexing theory and to experimental results.

    1. eLife Assessment

      This valuable study uses dynamic metabolic models to compare perturbation responses in a bacterial system, analyzing whether they return to their steady state or amplify beyond the initial perturbation. The evidence supporting the emergent properties of perturbed metabolic systems to network topology and sensitivity to specific metabolites is solid, although the authors do not explain the origin of some significant inconsistencies between models.

    2. Reviewer #1 (Public review):

      (1a) Summary:

      The author studied metabolic networks for central metabolism, focusing on how system trajectories returned to their steady state. To quantify the response, systematic perturbation was performed in simulation and the maximal destabilization away from steady state (compared with initial perturbation distance) was characterized. The author analyzed the perturbation response and found that sparse network and networks with more cofactors are more "stable", in the sense that the perturbed trajectories have smaller deviation along the path back to the steady state.

      (1b) Strengths and major contributions:

      The author compared three metabolic models and performed systematic perturbation analysis in simulation. This is the first work characterized how perturbed trajectories deviate from equilibrium in large biochemical systems and illustrated interesting findings about the difference between sparse biological systems and randomly simulated reaction networks.

      (1c) Discussion and impact for the field:

      Metabolic perturbation is an important topic in cell biology and has important clinical implication in pharmacodynamics. The computational analysis in this study provides an initiative for future quantitative analysis on metabolism and homeostasis.

      Comments on revised version:

      The revised version of this manuscript made some clarifications, while I think the analysis of response coefficients is still numerical and model-specific, being unclear under dynamical systems of views.

    3. Reviewer #2 (Public review):

      The authors have conducted a valuable comparative analysis of perturbation responses in three nonlinear kinetic models of E. coli central carbon metabolism found in the literature. They aimed to uncover commonalities and emergent properties in the perturbation responses of bacterial metabolism. They discovered that perturbations in the initial concentrations of specific metabolites, such as adenylate cofactors and pyruvate, significantly affect the maximal deviation of the responses from steady-state values. Furthermore, they explored whether the network connectivity (sparse versus dense connections) influences these perturbation responses. The manuscript is reasonably well written.

      Comments on revised version:

      The authors have addressed my concerns to a large extent. However, a few minor issues remain, as listed below:

      (1) The authors identified key metabolites affecting responses to perturbations in two ways: (i) by fixing a metabolite's value and (ii) by performing a sensitivity analysis. It would be helpful for the modeling community to understand better the differences and similarities in the obtained results. Do both methods identify substrate-level regulators? Is freezing a metabolite's dynamics dramatically changing the metabolic response (and if yes, which ones are so different in the two cases)? Does the scope of the network affect these differences and similarities?

      (2) Regarding the issues the authors encountered when performing the sensitivity analysis, they can be approached in two ways. First, the authors can check the methods for computing conserved moieties nicely explained by Sauro's group (doi:10.1093/bioinformatics/bti800) and compute them for large-scale networks (but beware of metabolites that belong to several conserved pools). Otherwise, the conserved pools of metabolites can be considered as variables in the sensitivity analysis-grouping multiple parameters is a common approach in sensitivity analysis.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):  

      First, the metabolic network in this study is incomplete. For example, amino acid synthesis and lipid synthesis are important for biomass and growth, but they4 are not included in the three models used in this study. NADH and NADPH are as important as ATP/ADP/AMP, but they are not included in the models. In the future, a more comprehensive metabolic and biosynthesis model is required.  

      Thank you for the critical comment on the weakness of the present study. We actually tried to study a larger model like Turnborg et al (2021), which is a model of JCVI-syn3A, but we give up to include it in our model list to study in depth. This is because we noticed that the concentration of ATP in the model can be negative (we confirmed this with one of the authors of the paper). Another "big" kinetic model of metabolism that we could list would be Khodayari et al (2017). However, we could not find the models to compare the dynamics of this big model with. Therefore, we decided to use the model only for the central carbon metabolism for now. We would like to leave a more extended study for the near future.  

      We would like to mention that NADH and NADPH are included in Khodayari model and Boecker model, while NADH and NADPH are ramped up to NADH in the latter model.  

      Second, this work does not provide a mathematical explanation of the perturbation response χ. Since the perturbation analysis is performed close to the steady state (or at least belongs to the attractor of single-steady-state), local linear analysis would provide useful information. By complementing with other analysis in dynamical systems (described below) we can gain more logical insights about perturbation response.  

      We tried a linear stability analysis. However, with the perturbation strength we used here, the linearization of the model is no longer valid, in the sense that the linearized model

      leads to negative concentrations of the metabolites (xst+Δx < 0 for some metabolites). We have added a scatter plot of the response coefficient of trajectories sharing the initial condition, while the dynamics are computed by the original model and the linearized model, respectively. (Fig. S1). 

      Since the response coefficient is based on the logarithm of the concentrations, as the metabolite concentrations approach zero, the response coefficient becomes larger. The high response coefficient in the Boecker and Chassagnole model would be explained by this artifact.  The linearized Khodayari model shows either χ~1 or χ = 0 (one or more metabolite concentrations become negative). This could be due to the number of variables in the model. For the response coefficient to have a larger value, the perturbation should be along the eigenvector that leads to oscillatory dynamics with long relaxation time (i.e., the corresponding eigenvalue has a small real part in terms of absolute value and a non-zero imaginary part). However, since the Khodayari model has about 800 variables, if perturbations are along such directions, there is a high probability that one or more metabolite concentrations will become negative.

      We fully agree that if the perturbations on the metabolite concentrations are in the linear regime, the response to the perturbations can be estimated by checking the eigenvalues and eigenvectors. However, we would say that the relationship between the linearized model (and thus the spectrum of eigenvalues) and the original model is unclear in this regime.  We remarked this in Lines 158160.

      Recommendations for the authors:

      My major suggestion is about understanding the key quantity in this study: the response coefficient χ. When the perturbed state is close to the fixed point, one could adopt local stability analysis and consider the linearized system. For a linear system with one stable fixed point P, we consider the Jacobian matrix M on P. If all eigenvalues of M are real and negative, the perturbed trajectory will return to P with each component monotonically varies. If some eigenvalues have negative real part and nonzero imaginary part, then the perturbed trajectory will spiral inward to the fixed point. Depending on the spiral trajectory and the initially perturbed state, some components would deviate furthermore (transiently) from the fixed point on the spiral trajectory. This explains why the response coefficient χ can be greater than 1. 

      Mathematically, a locally linearized system has similar behavior to the linear system, and the examples in this study can be analyzed in the similar way. Specifically, if a system has many complex eigenvalues, then the perturbed trajectory is more likely to have further deviation. The metabolic network models investigated in this work are not extremely large, and hence the author could analyze its spectrum of the Jacobian matrix at the steady state. Since the steady state is stable, I expect the spectrum located in the left half of the complex plane. If the spectrum spread out away from the real axis, we expect to see more spiral trajectories under perturbation. I think the spectrum analysis will provide a complementary view with respect to analysis on χ.  The authors' major findings, about the network sparsity and cofactors, can also be investigated under the framework of the spectrum analysis.  

      Of course, when the nonlinear system is perturbed far away from the fixed point, there are other geometrical properties of the vector field that can cause the response coefficient χ to be greater than 1. This could also be investigated in the future by testing the behavior of small and large perturbations and observing if the systems have signatures of nonlinearity.  

      Since all perturbed states return to the steady state, the eigenvalues of the Jacobi matrix accompanying the linearized system around the steady state are in the left half complex plane (negative real value). Also, some eigenvalues have non-zero imaginary parts.    

      The reason we emphasize the "nonlinear regime" is that the linearization is no longer valid in this regime, i.e. the metabolite concentrations can be negative when we calculate the linearized system. Certainly, there are complex eigenvalues in the Jacobi matrix of any model. However, we would say that there is no clear relationship between the eigenvalues and the response coefficient.      

      Minor suggestions:  

      Line 127: Regarding the source of perturbation, cell division also generates unequal concentration of proteins and metabolites for two daughter cells, and it is an interesting mechanism to create metabolic perturbation. 

      Thank you for the insightful suggestion. We mentioned the cell division as another source of perturbation (Lines 130-131).

      Line 175: I do not quite understand the statement "fixing each metabolite concentration...", since the metabolite concentration in the ODE simulation would change immediately after this fixing.  

      We meant in the sentence that we fixed the concentration of the selected metabolite as the steady state concentration and set the dx/dt of that metabolite to zero. We have rewritten the sentences to avoid confusion (Lines 180-181).

      Figure 2: There are a lot of inconsistencies between the three models. Could we learn which model is more reasonable, or the conclusion here is that the cellular response under perturbation is model-specific? The latter explanation may not be quite satisfactory since we expect the overall cellular property should not be sensitive to the model details. 

      Ideally, the overall cellular property should be insensitive to model details. However, the reality is that the behavior of the models (e.g., steady-state properties, relaxation dynamics, etc.) depends on the specific parameter choices, including what regulation is implemented. I think this situation is part of the motivation for the ensemble modeling (by J. Liao and colleague) that has been developed.  

      Detailed responsiveness would be model specific. For example, FBP has a fairly strong effect in the Boecker model, but less so in the Khodayari model, and the opposite effect in the Chassagnole model (Fig. 2). Our question was whether there are common tendencies among kinetic models that tend to show model-specific behavior.  

      Reviewer 2 (Public Review):

      (1) In the study on determining key metabolites affecting responses to perturbations (starting from line 171), the authors fix the values of individual concentrations to their steady-state values and observe the responses. Such a procedure adds artificial constraints to the network because, in the natural responses of cells (and models) to perturbations, it is highly unlikely that metabolites will not evolve in time. By fixing the values of specific metabolites, the authors prohibit the metabolic network from evolving in the most optimal way to compensate for the perturbation. Instead of this procedure, have the authors considered for this task applying techniques from variance-based sensitivity analysis (Sobol, global sensitivity analysis), where they can calculate the first-order sensitivity index and total effect index? Using this technique, the authors would be able to determine the key metabolites while allowing for metabolic responses to perturbations without unnatural constraints. 

      Thank you for the useful suggestion for studying the roles of each metabolite for responsiveness. We have computed the total sensitivity index (Homma and Salteli, 1996) for each metabolite of each model (Fig.S5). The total sensitivity indices of ATP are high-ranked in Khodayari- and Chassagnole model, while it is middle-ranked in the Boecker model. We believe that the importance of the adenyl cofactors is highlighted also in terms of the Sobol’ sensitivity analysis (the figure is referred in Lines 193-195). 

      We have encountered a minor difficulty for computing the sensitivity index. For the computation of the sensitivity index, we need to carry out the following Monte Carlo integral, 

      where the superscript (m) is the sample number index. The subscript i represents the ith element of the vector x, and ~i represents the vector x except for the ith element. The tilde stands for resampling.  

      There are several conserved quantities in each model. For independent resampling, we need to deal with the conserved quantities. For the Boecker and Chassagnole models, we picked a single metabolite from each conservation law and solved its concentration algebraically to make the metabolite concentration the dependent variable. Then, we can resample the metabolite concentration of one metabolite without changing the concentrations of other metabolites, which are independent variables.  

      However, in the Khodayari model, it was difficult to solve the dependent variables because the model has about 800 variables. Therefore, we gave up the computations of the sensitivity indices of the metabolites whose concentration is part of any conserved quantities, namely NAD, NADH, NADP, NADPH, Q8, and Q8H2.

      (2) To follow up on the previous remark, the authors state that the metabolites that augment the response coefficient when their concentration is fixed tend to be allosteric regulators. The authors should report which allosteric regulations are implemented in each of the models so that one can compare against Figure 2. Again, the effect of allosteric regulation by a specific metabolite that is quantified the way the authors did is biased by fixing the concentration value - it is true that negative feedback is broken when the metabolite concentration is fixed, however, in the rate law, there is still the fixed inhibition term with its value corresponding to the inhibition at the steady state. To see the effect of allosteric regulation by a metabolite, one can change the inhibition constants instead of constraining the responses with fixed concentrations.  

      We have listed the substrate-level regulations (Table S1-3). Also, we re-ran the simulation with reduced the effect of the substrate-level regulations for the reactions that are suspected to influence the change of the response coefficient. Instead of fixing the concentrations (Fig. S6). 

      The impact of substrate-level regulations is discussed in Lines 203-212.   

      We replaced "allosteric regulation" with "substrate-level regulation" because we noticed that some regulations are not necessarily allosteric.

      (3) Given the role of ATP in metabolic processes, the authors' finding of the sensitivity of the three networks' responses to perturbations in the AXP concentrations seems reasonable. However, drawing such firm conclusions from only three models, with each of them built around one steady state and having one kinetic parameter set despite that they were built for different physiologies, raises some questions. It is well-known in studies related to basins of attraction of the steady states that the nonlinear responses also depend on the actual steady states, the values of kinetic parameters, and implemented kinetic rate law, i.e., not only on the topology of the underlying systems. In the population of only three models, we cannot exclude the possibility of overlaps and strong similarities in the values of kinetic parameters, steady states, and enzyme saturations that all affect and might bias the observed responses. Ideally, to eliminate the possibility of such biases, one should simulate responses of a large population of models for multiple physiologies (and the corresponding steady states) and multiple parameter sets per physiology. This can be a difficult task, but having more kinetic models in this work would go a long way toward more convincing results. Recently, E. coli nonlinear kinetic models from several groups appeared that might help in this task, e.g., Haiman et al., PLoS Comput Biol, 17(1): e1008208, (2021), Choudhury et al., Nat Mach Intell, 4, 710-719, (2022); Hu et al., Metab Eng, 82, 123-133 (2024), Narayanan et al., Nat Commun, 15:723, (2024). 

      We have computed the responsiveness of 215 models generated by the MASSpy package (Haiman et al, 2021). Several model realizations showed a strong responsiveness, i.e. a broader distribution of the response coefficient (Fig.S8), and mentioned in Lines 339-341.

      We would like to mention that the three models studied in the present manuscript have limited overlap in terms of kinetic rate law and, accordingly, parameter values. In the Khodayari model, all reactions are bi-uni or uni-uni reactions implemented by mass-action kinetics, while the Boecker and Chassagnole models use the generalized Michaelis-Menten type rate laws. Also, the relationship between the response coefficients of the original model and the linearized model highlights the differences between the models (Fig. S1). If the models were somewhat effectively similar, the scatter plots of the response coefficient of the original- and linearized model should look similar among the three models. However, the three panels show completely different trends. Thus, the three models have less similarity even when they are linearized around the steady states. 

      (4) Can the authors share their insights on what could be the underlying reasons for the bimodal distribution in Figure 1E? Even after adding random reactions, the distribution still has two modes - why is that?  

      We have not yet resolved why only the Khodayari model shows the bimodal distribution of the response coefficient. However, by examining the time courses, the dynamics of the Khodayari model look like those of the excitable systems. This feature may contribute to the bimodal distribution of the response coefficient. In the future, we would like to show whether the system is indeed the excitable system and whcih reactions contribute to such dynamics.

      (5) Considering the effects of the sparsity of the networks on the perturbation responses (from line 223 onwards), when we compare the three analyzed models, it is clear that the Khodayari et al. model is a superset of the other two models. Therefore, this model can be considered as, e.g., Chassagnole model with Nadd reactions (though not randomly added). Based on Figures 1b and S2, one can observe that the responses of the Khodayari models have stronger responses, which is exactly opposite to the authors' conclusion that adding the reactions weakens the responses.

      The authors should comment on this.  

      The sparsity of the network is defined by the ratio of the number of metabolites to the number of reactions. Note that the Khodayari model is a superset of the Boecker and Chassagnole models in terms of the number of reactions, but also in terms of the number of metabolites (Boecker does not have the pentose phosphate pathway, Chassagnole does not have the TCA cycle, and neither has oxyative phosphorylation). Thus, even if we manually add reactions to the Boecker model, for example, we cannot obtain a network that is equivalent to the Khodayari model.  We added one sentence to clarify the point (Lines 254-255).

      Recommendations for the authors: 

      (1) Some typos: Line 57, remove ?; Line 134, correct "relaxation". 

      Thank you for pointing out. We fixed the typos.

      (2) Lines 510-515, please rewrite/clarify, it is confusing what are you doing. 

      We rewrote the sentences (Lines 529-532). We are sorry for the confusion.

      (3) Line 522, where are the expressions above Leq and K*? 

      Leq appears in the original paper of the Boecker model, but we decided not to use Leq. We apologize for not removing Leq from the present manuscript. The * in K* is the wildcard for representing the subscripts. We added a description for the role of “*”. 

      (4) Lines 525-530, based on the wording, it seems like you test first for 128 initial concentrations if the models converge back to the steady state and then you generate another set of 128 initial concentrations - is this what you are doing, or you simply use the 128 initial concentrations that have passed the test? 

      We apologize for the confusion. We did the first thing. We have rewritten the sentence to make it clearer. 

      (5) Figure 3, caption, by "broken line," did the authors mean "dashed line"? 

      We meant dashed line. We changed “broken line” to “dashed line”.

    1. eLife Assessment

      This study presents an important application of high-content image-based morphological profiling to quantitatively and systematically characterize induced pluripotent stem cell-derived mixed neural cultures cell type compositions. Exceptional evidence through rigorous experimental and computational validations support new potential applications of this cheap and simple assay.

    2. Joint Public Review:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The authors also propose a new nucleocentric phenotyping pipeline, where a convolutional neural network is trained on the nucleus and some margins around it. This nucleocentric approach improves classification performance at high densities because nuclear segmentation is less prone to errors in dense cultures.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of high-content phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of feature-based (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell / nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review: 

      Summary: 

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cell-derived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contain sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and to describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of highcontent phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of featurebased (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

      Comments on latest version:

      I have consulted with Reviewer #3 and both of us were impressed by revised manuscript, especially by the clear and convincing evidence regarding the nucleocentric model use of the nuclear periphery and its benefit for the case of dense cultures. However, there are two issues that are incompletely addressed (see below). Until these are resolved, the "strength of evidence" was elevated to "compelling".

      First, the analysis of the patch size is not clearly indicating that the 12-18um range is a critical factor (Fig. 4E). On the contrary, the performance seems to be not very sensitive to the patch size, which is actually a desired property for a method. Still, Fig. 4B convincingly shows that the nucleocentric model is not sensitive to the culture density, while the other models are. Thus, the authors can adjust their text saying that the nucleocentric approach is not sensitive to the patch size and that the patch size is selected to capture the nucleus and some margins around it, making it less prone to segmentation errors in dense cultures.

      We agree that there is a significant tolerance to different patch sizes, and have therefore reformulated the conclusion as suggested in the results and the discussion sections (page 10 and 16). As very large patch sizes (>40µm) do increase the variability of the predictions and the imbalance between recall and precision, we have left this observation in the results section, as it also motivates for using smaller patch sizes.  

      Second, the GitHub does not contain sufficient information to reproduce the analysis. Its current state is sparse with documentation that would make reproducing the work difficult. What versions of the software were used? Where should data be downloaded? The README contains references to many different argparse CLI arguments, but sparse details on what these arguments actually are, and which parameters the authors used to perform their analyses. Links to images are broken. Ideally, all of these details would be present, and the authors would include a step-by-step tutorial on how to reproduce their work. Fixing this will lead to an "exceptional" strength of evidence.

      We have added additional information to the GitHub to increase the reproducibility of the analysis.  

      • The README now contains additional documentation and more extensive explanations. A flowchart has been added, making the dataflow and order of analyses more clear.  

      • The accompanying dataset is 20GB in size and can be downloaded as a .zip-file from https://figshare.com/articles/dataset/Nucleocentric-Profiling/27141441?file=49522557. This file contains 2x480 raw images and a layout file.  

      • The used software versions are included in the manuscript in table 4. To increase the reproducibility, a Conda environment file (.yaml) has been added to the GitHub. This can be installed and contains the correct package versions.

      • The README now contains for each script and its arguments a short description on its meaning, on whether it is required or optional and its default setting.  

      • A step-by-step tutorial on the use of the test dataset has been included. This tutorial includes the arguments used to run the code from the command line terminal.

      Recommendations for the authors:

      There are no reference from the text to Fig. 2D and to Fig. 3C.

      This has been changed. The text has been added to the manuscript at page 6 (fig. 2D) and the reference to Fig. 3C has been included at page 8.

    1. eLife Assessment

      This important study reveals a new mechanism for gene regulation in neurons by an RNA binding protein called RBM20 previously studied in the heart. The methods used are compelling, including the generation of new mouse knockout strains and leading edge sequencing methods for identification of gene regulatory mechanisms. The study shows that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins in specific neuronal cell types, but the functional consequences of this regulation remain questions for the future.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study set out to find RNA binding proteins in the CNS in cell-type specific sequencing data and discover that the cardiomyopathy-associated protein RBM20 is selectively expressed in olfactory bulb glutamatergic neurons and PV+ GABAergic neurons. They make an HA-tagged RBM20 allele to perform CLIP-seq to identify RBM20 binding sites and find direct targets of RBM20 in olfactory bulb glutmatergic neurons. In these neurons, RBM20 binds intronic regions. RBM20 has previously been implicated in splicing, but when they selectively knockout RBM20 in glutamatergic neurons they do not see changes in splicing, but they do see changes in RNA abundance, especially of long genes with many introns, which are enriched for synapse-associated functions. These data show that RBM20 has important functions in gene regulation in neurons, which was previously unknown, and they suggest it acts through a mechanism distinct from what has been studied before in cardiomyocytes.

      Strengths:

      The study finds expression of the cardiomyopathy-associated RNA binding protein RBM20 in specific neurons in the brain, opening new windows into its potential functions there.

      The study uses CLIP-seq to identify RBM20 binding RNAs in olfactory bulb neurons.

      Conditional knockout of RBM20 in glutamatergic or PV neurons allows the authors to detect mRNA expression that is regulated by RBM20.

      The data include substantial controls and quality control information to support the rigor of the findings.

      Weaknesses:

      The authors do not fully identify the mechanism by which RBM20 acts to regulate RNA expression in neurons, though they do provide data suggesting that neuronal RBM20 does not regulate alternate splicing in neurons, which is an interesting contrast to its proposed mechanism of function in cardiomyocytes. Discovery of the RNA regulatory functions of RBM20 in neurons is left as a question for future studies.

      The study does not identify functional consequences of the RNA changes in the conditional knockout cells, so this is also a question for the future.

    3. Reviewer #2 (Public review):

      Summary:

      The group around Prof. Scheiffele has made seminal discoveries reg. alternative splicing that is reflected by a current ERC advanced grant and landmark papers in eLife (2015), Science (2016), and Nature Neuroscience (2019). Recently, the group investigated proteins that contain an RRM motif in the mouse cortex. One of them, termed RBM20, was originally thought to be muscle-specific and involved in alternative splicing in cardiomyocytes. However, upon close inspection, RBP20 is expressed in a particular set of interneurons (PV positive cells of the somatosensory cortex) in the cortex as well as in mitral cells of the olfactory bulb (OB). Importantly, they used CLIP to identify targets in the OB and heart. Next and quite importantly, they generated a knock-in mouse line with a His-biotin acceptor peptide and a HA epitope to perform specific biochemistry. Not surprisingly, this allowed them to specifically identify transcripts with long introns, however, most of the intronic binding sites were very distant to the splice sites. Closer GO term inspection revealed that RBM20 specifically regulates synapse-related transcripts. In order to get in vivo insight into its function in the brain, the authors generated both global as well as conditional KO mice. Surprisingly, there were no significant differences in in RBM20 ΔPV interneurons, however, 409 transcripts were deregulated in in OB glutamatergic neurons. Here, CLIP sites were mostly found to be very distant from differentially expressed exons. Furthermore, loss-of-function RBM20 primarily yields loss of transcripts, whereas upregulation appears to be indirect. Together, these results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Strengths:

      The quality of the data and the figures is high, impressive and convincing. The reported results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Weaknesses:

      I would not use the term weakness here.<br /> The description of the results is sometimes too dense and technical. As eLife does not have a size limit, there is no reason for the results section to be less than three pages. Especially the last paragraph of the results part (p4) does not do justice to the high importance of Fig. 5, which I consider of high importance and originality. Here are a few suggestions from a person that is not working on splicing, to improve the text part of this important manuscript.

      (1) Introduction: too short, include a paragraph on splicing and cryptic exons<br /> (2) Results:<br /> - shortly describe the phenotypes of the mice mentioned<br /> - expand the section on Fig. 5 and cryptic exons especially<br /> (3) Discussion: too short, expand on the possible new role of RBM20 and target degradation, possibly by adding a scheme?

    4. Reviewer #3 (Public review):

      Summary:

      The authors identified RBM20 expression in neural tissues using cell type-specific transcriptomic analysis. This discovery was further validated through in vitro and in vivo approaches, including RNA fluorescent in situ hybridization (FISH), open-source datasets, immunostaining, western blotting, and gene-edited RBM20 knockout (KO) mice. CLIP-seq and RiboTRAP data demonstrated that RBM20 regulates common targets in both neural and cardiac tissues, while also modulating tissue-specific targets. Furthermore, the study revealed that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins.

      Strengths:

      • Utilization of a large dataset combined with experimental evidence to identify and validate RBM20 expression in neural tissues.<br /> • Global and tissue-specific RBM20 KO mouse models provide robust support for RBM20 localization and expression.<br /> • Employing heart tissue as a control highlights the unique findings in neural tissues.

      Weaknesses:

      • Lack of physiological functional studies to explore RBM20's role in neural tissues.<br /> • Data quality requires improvement for stronger conclusions.<br /> • Western blot sample size should be increased for enhanced reliability.

    5. Author response:

      We thank the reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We intend to address these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions with regards to specific RBM20-dependent mRNAs and providing more detailed explanations in results and discussion.

      (2) We will upload higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We will include data on eCLIP control experiments.

      (4) We will add information on replication and new data for the western blot analysis.

    1. eLife Assessment

      This valuable study shows a surprising scale-invariance of the covariance spectrum of large-scale recordings in the zebrafish brain in vivo. A solid analysis demonstrates that a Euclidean random matrix model of the covariance matrix recapitulates these properties. The results provide several new and insightful approaches for probing large-scale neural recordings.

    2. Joint Public Reviews:

      Summary:

      The authors examine the eigenvalue spectrum of the covariance matrix of neural recordings in the whole-brain larval zebrafish during hunting and spontaneous behavior. They find that the spectrum is approximately power law, and, more importantly, exhibits scale-invariance under random subsampling of neurons. This property is not exhibited by conventional models of covariance spectra, motivating the introduction of the Euclidean random matrix model. The authors show that this tractable model captures the scale invariance they observe. They also examine the effects of subsampling based on anatomical location or functional relationships. Finally, they briefly discuss the benefit of neural codes which can be subsampled without significant loss of information.

      Strengths:

      With large-scale neural recordings becoming increasingly common, neuroscientists are faced with the question: how should we analyze them? To address that question, this paper proposes the Euclidean random matrix model, which embeds neurons randomly in an abstract feature space. This model is analytically tractable and matches two nontrivial features of the covariance matrix: approximate power law scaling, and invariance under subsampling. It thus introduces an important conceptual and technical advance for understanding large-scale simultaneously recorded neural activity.

      Weaknesses:

      The downside of using summary statistics is that they can be hard to interpret. Often the finding of scale invariance, and approximate power law behavior, points to something interesting. But here caution is in order: for instance, most critical phenomena in neural activity have been explained by relatively simple models that have very little to do with computation (Aitchison et al., PLoS CB 12:e1005110, 2016; Morrell et al., eLife 12, RP89337, 2014). Whether the same holds for the properties found here remains an open question.

    3. Author response:

      We are grateful for the thorough and constructive feedback provided on our manuscript.

      Regarding the main concern about power law behavior and scale invariance, we would like to clarify that our study does not aim to establish criticality. Instead, we focus on describing and understanding a specific scale-invariant property: the collapsed eigenspectra in neural activity under random sampling. Indeed, we tested Morrell et al.’s latent-variable model (eLife 12, RP89337, 2024, [1]), where a slowly varying latent factor drives population activity. Although it produces a seemingly power-law-like spectrum, random sampling does not replicate the strict spectral collapse observed in our data (second row in Author response image 1). This highlights that simply adding latent factors does not fully recapitulate the scale invariance we measure, suggesting richer or more intricate processes may be involved in real neural recordings.

      Author response image 1.

      Morrell et al.’s latent variable model [1, 2]. A-D: Functional sampled (RSap) eigenspectral of the Morrell et al. model. E-H: Random sampled (RSap) eigenspectra of the same model. Briefly, in Morrell et al.’s latent variable model [1, 2], neural activity is driven by Nf latent fields and a place fields. The latent fields are modeled as Ornstein-Uhlenbeck processes with a time constant τ . The parameters ϵ and η control the mean and variance of individual neurons’ firing rates, respectively. The following are the parameter values used. A,E: Using the same parameters as in [1]: N<sub>f</sub> = 10, ϵ = −2.67, η = 6, τ = 0.1. Half of the cells are also coupled to the place field. B,C,D,F,G,H: Using parameters from [2]: N<sub>f</sub> = 5, ϵ = −3, η = 4. There is no place field. The time constant τ = 0.1, 1, 10 for B,F, C,G, and D,H, respectively.

      We decided to make 5 key revisions.

      • As mentioned, we have evaluated the latent variable model proposed by Morrell et al. and found that they fail to reproduce the scale-invariant eigenspectra observed in our data; these results will be presented in the Discussion section and supported by a new Supplementary Figure.

      • We will include a discussion on the findings of Manley et al. (2024, [2]) regarding the issue of saturating dimensionality in the Discussion section, highlighting the methodological differences and their implications.

      • We will add a new mathematical derivation in the Methods section, elucidating the bounded dimensionality using the spectral properties of our model.

      • We will elaborate in the Discussion section to further emphasize the robustness of our findings by demonstrating their consistency across diverse datasets and experimental techniques.

      • We will incorporate a brief discussion on the implications for neural coding. In particular, Fisher information can become unbounded when the slope of the power-law rank plot is less than one, as highlighted in the recent work by Moosavi et al. (bioRxiv 2024.08.23.608710, Aug, 2024 [3]) in the Discussion section.

      We believe these revisions will address the concerns raised by you and collectively strengthen our manuscript to provide a more comprehensive and robust understanding of the geometry and dimensionality of brain-wide activity.

      References

      (1) M. C. Morrell, A. J. Sederberg, I. Nemenman, Latent dynamical variables produce signatures of spatiotemporal criticality in large biological systems. Physical Review Letters 126, 118302 (2021).

      (2) M. C. Morrell, I. Nemenman, A. Sederberg, Neural criticality from effective latent variables. eLife 12, RP89337 (2024).

    1. eLife Assessment

      The authors use a range of techniques to examine the role of Aurora Kinase A (AurA) in trained immunity. The study is hypothesis driven, it uses solid experimental approaches, and the data are presented in a logical manner. The findings are valuable to the trained immunity field because they provide an in-depth look at a common inducer of trained immunity, beta-glucan.

    2. Reviewer #1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenocylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:

      With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      Weaknesses

      (1) The manuscript contains factual inaccuracies such as:<br /> (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649)<br /> (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.<br /> (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative.<br /> (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc).<br /> (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.<br /> (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.<br /> (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?<br /> (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      (7) Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      (8) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      (9) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      (10) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      (11) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      (12) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

    4. Author response:

      Reviewer #1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenosylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:

      With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      (1) The manuscript contains factual inaccuracies such as: (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649).

      We appreciate the reviewer for pointing out this inaccuracy, and we will revise our statement to ensure accurate and updated description. We are aware that trained immunity involves different metabolic pathways, including both glycolysis and oxidative phosphorylation[1, 2]. We also detected Oxygen Consumption Rate (OCR, as detailed in comment#8) but observed no increase of oxygen consumption in trained BMDMs while previous study reported decreased oxidative phosphorylation[3]. We will discuss the potential reasons underlying such different results.

      (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      We are sorry for the inaccurate description, and we will correct the statement in our revised manuscript as “Despite the fact that the concept of “trained immunity” has been proposed since 2011, the mechanisms that regulate trained immunity are still not completely understood.”

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.

      (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative. (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc). (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.

      Thank you for the suggestions and we will revise all data presented as mean ± SEM in the manuscript to mean ± SD, and provide a detailed description of how multiple comparisons were performed and explain the rationale behind the different comparison methods used. Previous studies have shown that knockdown of GNMT increases intracellular SAM level and knockdown of GNMT is commonly used as a method to upregulate SAM[4-6]. Thus we used 1-tailed test in Figure 3J.

      (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.

      We are sorry for the confusion in our description in figure legends. As for in vitro studies, we performed at least three independent experiments (BMs isolated from different mice) but we only display technical replicates data from one experiment in our manuscript. As for seq data, we acknowledge the reviewer's concern regarding the small sample size (n=2) in our RNA-seq/ATAC-seq experiment. We consider the sequencing experiment mainly as an exploratory approach, and performed rigorous quality control and normalization of the sequencing data to ensure the reliability of our findings. While we understand that a larger sample size would be ideal for drawing more definitive conclusions, we believe that the current data offer valuable preliminary insights that can inform future studies with larger cohorts. As a complementary method, we conducted ChIP PCR for detecting active histone modification enrichment in Il6 and Tnf region to further verify the increased accessibility of trained immunity induced inflammatory genes and reliability of our conclusions despite the small sample size. We hope this clarifies our approach, and we would be happy to further acknowledge and discuss the limitations of the current study.

      For the in vivo experiment, we determined the sample size by referring to the animal numbers used for similar experiments in literatures. And according to a reported resource equation approach for calculating sample size in animal studies[7], n=5-7 is suitable for most of our mouse experiments. We will describe the details in the revised methods part.

      (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      We are sorry for the confusion in our description in figure legends. In vivo experiments represent individual mice as biological replicates, the exact values of n are reported in figure legends and each point represents data from a different animal (Figure 1I, and Figure 6). The in vitro cell assay was performed in triplicates, each experiment was independently replicated at least three times and points represents technical replicates.

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      Thank you for your suggestion and we will report the outcomes of the entire drug screening in the revised manuscript.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?

      Thank you for your comments and we are sorry for unclear labelled results in supplementary 1B. We performed secondary drug screen at two concentrations, and drug concentrations corresponding to secondary screen#1 and #2 are 0.2, 1 μM respectively. That is to say, it is just in this order, not in an order of descending concentration.

      (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      Thank you for your question. The drug screen was performed without technical replicates. Actually, we observed s a lower concentration works better in some cases. This might be due to the fact that the drug's effect correlates positively with its concentration only within a specific range (as seen in comment#4).

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      We are sorry for the omission for the qPCR method. The mRNA expression of Il6 and Tnf in trained BMDMs was normalized to untrained BMDMs and β-actin served as a reference gene. And we will describe in detail in our revised manuscript.

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      Thanks for this comment. In the original data, p-Aurora and total Aurora were from different gels. In this experiment the membrane stripping/reprobing after p-Aurora antibody did now work well, so we couldn’t get all results from one gel, and we had to run another gel using the same samples to blot with anti-aurora antibody. Yes we should have provided separated actin blots as loading controls for this experiment. We will repeat the experiment and provide original data of three biological replicates to confirm the experiment result.

      Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      We appreciate the valuable suggestion. We will make a discussion in our revised manuscript.

      (7) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      We appreciate this question raised by the reviewer. We previously performed seahorse XF analyze to measure mito stress in β-glucan trained BMDMs in combination with alisertib (data not shown in our submitted manuscript). The results showed no increase in oxidative phosphorylation under β-glucan stimulation.

      Author response image 1.

      (8) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      Thank you for your comment. We appreciate that including “alisertib-alone” group throughout all the experiments may add more value to the findings. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity. Therefore, in most settings, we did not focus on the role of aurora kinase A without β-glucan stimulation. Initially, we showed in Figure 1B and 1C that alisertib alone in a concentration lower than 1μM (included) does not affect the response to secondary stimulus. In a previous report, the authors showed that Aurora A inhibitor alone did not affect trained immunity[8]. Thus, we did not include this control group in all of the experiments.

      (9) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      Thank you for pointing out this error. After checking the original data, we found that we indeed misassembled the orientation of several blots. We went through the assembling process and figured out that some orientations were assembled according to the loading sequences but not saved, so that the orientations in Figure 4A were not consistent with the unedited blot image. We are sorry for the careless mistake, and we will double check to make sure all the blots are correctly assembled in the revised manuscript.

      (10) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      Thanks for the reviewer’s suggestion and we will reorganize our language to ensure clarity and avoid any inconsistencies that might lead to misunderstanding.

      (11) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

      Thanks for the reviewer’s suggestion. To address potential concerns raised by the reviewers, we will perform intracellular cytokines staining in tumor experiments with mice trained with β-glucan or in combination with alisertib followed MC38 inoculation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      We appreciate the question raised by the reviewer. Though SAM generally acts as methyl donor, whether the epigenetic reprogram in trained immunity is directly linked to SAM metabolism is not known. In our study, we provided evidence suggesting the necessity of SAM maintenance in supporting trained immunity. As for in vivo tumor model, tumor cells were subcutaneously inoculated 24 h after oral administration of alisertib. Previous studies showed alisertib administered orally had a half-life of 10 h and 90% concentration reduction in serum after 24 h [9, 10]. Therefore, we suppose that tumor cells are more susceptible to long-term effects of drugs on the immune system rather than directly affected by alisertib. To further address the reviewer’s concern, we will perform bone marrow transplantation (trained mice as donor and naïve mice as recipient) to clarify the mechanistic contribution of trained immunity versus off-target effects.

      Cited references

      (1) Ferreira, A.V., et al., Metabolic Regulation in the Induction of Trained Immunity. Semin Immunopathol, 2024. 46(3-4): p. 7.

      (2) Keating, S.T., et al., Rewiring of glucose metabolism defines trained immunity induced by oxidized low-density lipoprotein. J Mol Med (Berl), 2020. 98(6): p. 819-831.

      (3) Li, X., et al., Maladaptive innate immune training of myelopoiesis links inflammatory comorbidities. Cell, 2022. 185(10): p. 1709-1727.e18.

      (4) Luka, Z., S.H. Mudd, and C. Wagner, Glycine N-methyltransferase and regulation of S-adenosylmethionine levels. J Biol Chem, 2009. 284(34): p. 22507-11.

      (5) Hughey, C.C., et al., Glycine N-methyltransferase deletion in mice diverts carbon flux from gluconeogenesis to pathways that utilize excess methionine cycle intermediates. J Biol Chem, 2018. 293(30): p. 11944-11954.

      (6) Simile, M.M., et al., Nuclear localization dictates hepatocarcinogenesis suppression by glycine N-methyltransferase. Transl Oncol, 2022. 15(1): p. 101239.

      (7) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (8) Benjaskulluecha, S., et al., Screening of compounds to identify novel epigenetic regulatory factors that affect innate immune memory in macrophages. Sci Rep, 2022. 12(1): p. 1912.

      (9) Yang, J.J., et al., Preclinical drug metabolism and pharmacokinetics, and prediction of human pharmacokinetics and efficacious dose of the investigational Aurora A kinase inhibitor alisertib (MLN8237). Drug Metab Lett, 2014. 7(2): p. 96-104.

      (10) Palani, S., et al., Preclinical pharmacokinetic/pharmacodynamic/efficacy relationships for alisertib, an investigational small-molecule inhibitor of Aurora A kinase. Cancer Chemother Pharmacol, 2013. 72(6): p. 1255-64.

    1. eLife assessment

      This paper makes a valuable contribution to the area of balancing selection at the Major histocompatibility complex (MHC), including trans-species polymorphism between humans and other primates, by incorporating a large evolutionary range of species and genes and by using newer methodological approaches to characterize the depth and extent of the trans-species polymorphism across an expanded range of primate taxa. While the presented results solidly support the authors' conclusions, additional analyses would be needed to firmly exclude modes of evolution that could mimic trans-specific polymorphism.

    2. Reviewer #1 (Public Review):

      HLA genes have long been known to harbor trans-species polymorphism (TSP). This manuscript aimed to use state-of-the-art analyses and updated genotyping data to rigorously test for the presence of TSP in HLA genes, quantify the timescales associated with HLA TSP, and relate HLA disease associations to evolutionary rates. To do this, the authors chose HLA alleles across great apes, old world monkeys, and new world monkeys on which to perform phylogenetic analyses, alongside non-parametric tests that compare patterns of synonymous diversity. Finally, HLA genetic associations with the disease were correlated with evolutionary rate.

      Strengths:

      The manuscript is well written and neatly organized, the figures are clear, and there are many supplementary analyses that will make this paper a great resource for MHC phylogenetics at allelic resolution.

      Deployment of modern methodology such as BEAST2 can also test if the hypothesis of TSP is supported while accounting for uncertainties in tree topology and evolutionary rates, necessary additions to analyses of the MHC.

      Weaknesses:

      Because TSP has already been convincingly demonstrated to occur in the MHC, the primary benefit of the current study is to ensure these previous observations are still supported by the wealth of genetic data that is now available and modern phylogenetic approaches. However, the benefit of using the robust BEAST2 method comes with the weakness of not using all available data. Focusing on single gene trees with only a small subset of alleles may bias results, and inclusion/exclusion criteria should be better defined.

      One major point that is somewhat overlooked is the presence of multiple copy numbers for the MHC genes through classic birth and death evolution. For example, MHC-B in new world monkeys is duplicated many times (up to 10; PMID: 23715823). This duplication is naturally accompanied by gene loss and pseudogene formation. All of these things muddy the waters considerably yet are not addressed here. A good example is MHC-A, where it has been very difficult to apportion orthologs, even amongst closely related species, due to alternative or incomplete duplication/loss across the species, or region configuration polymorphism (e.g. PMID: 26371256). An example is chimpanzee Patr-AL which shares similarities with human HLA-A*02 lineage, but is a separate locus, could this show up as TSP under the current analysis?

      Similarly, an alternative hypothesis for TSP is convergent gene conversion mutations: intergenic gene conversion has been repeatedly observed in HLA genes and the possibility of it occurring with the same two genes becomes more realistic over 45 million years. If the same two MHC genes recombined in humans and in an NWM, each on their own lineages, this would appear as TSP and would cause an overlap of pairwise synonymous divergence between human-human and human-NWM allele comparisons. This might be especially possible in MHC-DR and MHC-DQ genes presented in Figure 2 since both humans and NWM have multiple MHC-DRB and DQB genes (unless e.g. were genes besides HLA/MHC-DRB1 such as DRB3,4,5 included in the DRB phylogenies?). While BEAST2 may be a good way of robustly modeling and identifying TSP, and I understand these analyses cannot support many more sequences, the authors should consider adding an analysis that rules out gene conversion as an explanation for their results (especially the often repeated claim of 45 million year TSP). For example, can the authors use BLAST to ensure that the alleles that underlie 45 million years of TSP do not share close similarities to other HLA genes present in their respective human and NWM genomes? This seems like it could be fairly quickly performed for all genes, and even if it argued against TSP, it would be an interesting finding.

      Finally, the authors have limited themselves to a small subset of HLA/MHC alleles and do not provide sufficient information in the methods to understand how these were chosen nor sufficient discussion surrounding how inclusion/exclusion criteria could bias results. For example, the authors say the alleles were chosen at 2-digit (i.e. 1 field) resolution, but in the phylogenies of Fig. 2, I see variable numbers of alleles chosen for each 2-digit allele family - what metric was used to decide on these alleles?

      "We also collected associations between amino acids and TCR phenotypes". It is not clear either what was analyzed, or the results for this part of the analysis. This is a topic of much debate and none of the previous work has been discussed (PMID: 18304006, PMID: 29636542 as primers for this contentious subject)

      MHC class I also interact with NK cell receptors, including polymorphic KIR. Through their interactions during infection control and reproduction, the two complexes co-evolve across primates, contributing to the maintenance of MHC diversity. Interaction with KIR likely has a greater impact on HLA polymorphism than interactions with TCR, yet this is not factored into any of the models, or indeed mentioned in the text.

      One additional reason inclusion of the KIR binding is important relates to the point above about gene conversion, where it is established that gene conversion reproducibly swaps KIR-binding motifs among MHC class I alleles and genes. HLA-A*23, *24, and *25, *32, for example, are characterized by the acquisition of the 'Bw4' motif from HLA-B (PMID: 26284483), likely followed by positive natural selection. For exon 2 (which encodes the motif), these alleles turn up in a clade distinct from other human HLA-A (Fig 2-S1). What is the impact of the Bw4 motif on this phylogeny? Could this shuffling of motifs be interpreted as indirect TSP?

      The analysis that shows the most rapidly evolving sites occur in the peptide binding domain brings little new to the field. This has been established by the Hughes and Nei (cited) and Parham, Lawlor, etc of 1988 (e.g. PMID: 3375250), and replicated multiple times across human populations and many other species.<br /> Likewise, the disease association part. It is nice to have a summary of the known associations, but there are others out there and this one is far from thorough. Here, 50% of the information about infectious diseases appears to be taken from one reference, leaving out some major bodies of work; for example identifying specific peptide binding residues or peptides that associate with HIV (PMID: 22896606) or malaria control (PMID: 1280333). It is also missing some major concepts -such as the DRB1 'shared epitope' of peptide binding residues that predispose to Rheumatoid Arthritis and protects from Parkinson's disease (35 years of work from PMID: 2446635 through PMID: 30910980). The nasopharyngeal carcinoma and EBV story (e.g. PMID: 23209447). Another huge gap here is the pregnancy syndromes -associations of specific HLA C and NK cell receptor allotypes with preeclampsia for example. There are thousands of HLA associations not considered in this section, and to do them justice would likely require an enormous amount of work.<br /> Thus - neither the idea that HLA/MHC polymorphism is focused on peptide binding nor that this binding drives resistance to infection and associations with the disease are new concepts. The previous work in these areas is inadequately acknowledged.

      The paper is written in a very approachable language, which is nice to read and friendly to non-experts, but perhaps a little too much so in places. I find that the paper follows a very non-traditional format with respect to for example the results section, which seems a mixture of Introduction/methods/figure legends/discussion with no real solid result description.

    3. Reviewer #2 (Public Review):

      Fortier and Pritchard investigated the breadth and depth of trans-species polymorphism (TSP) within six primate classical (antigen-presenting) major histocompatibility complex (MHC) genes (three MHC class I and three MHC class II). The MHC is of wide interest because of its unique evolutionary patterns within the genomes of jawed vertebrates and for its extensive and consistent associations with disease phenotypes. The findings of the paper are:<br /> 1) Trans-species polymorphism (TSP) within major histocompatibility complex (MHC) genes, whereby some alleles are more similar between rather than within species, occurs between humans and non-human primates despite rapid allelic turnover.<br /> 2) Highly polymorphic/rapidly evolving sites are mostly involved in peptide binding.<br /> 3) The identified, rapidly-evolving sites are associated with disease.

      However, because these general findings have been previously demonstrated to varying extents by numerous other studies, these are not the strength of this paper. The strength and importance of this paper are in its utilization of a large evolutionary range of species and genes and its methodological approach and the extent of analyses undertaken to characterize the depth and extent of the TSP among primates. The major contribution of this paper is showing that TSP in the MHC is widespread among diverse primate taxa, and, depending on the particular MHC gene, TSP can be detected between humans and non-human primates as distantly diverged from the human lineage as new world monkeys of the Americas, ~45 million years ago. The paper, overall, made good methodological choices to account for the fascinating but challenging nature of the MHC, which includes its extensive allelic polymorphism (much of which is only characterized for the peptide-binding domain, encoded by exons 2 and 3), the difficulty in assessing phylogenetic relationships (particularly due to recombination and/or interallelic gene conversion), and differentiating convergence from conservation. There is no single analysis that can perfectly account for all these factors. This paper used two methods to test for TSP, Bayesian evolutionary analysis and synonymous nucleotide distances (dS), each with their respective strengths and limitations articulated. TSP, to varying degrees, is supported by both analyses. The paper further identifies rapidly evolving positions within the MHC molecules (predominantly located in the MHC peptide-binding domain), quantitatively shows that they are more likely to be in proximity to the bound peptide within the peptide binding domain, and shows, via a literature review of HLA fine-mapping studies, that those positions are associated with both infectious and autoimmune disease.

      The conclusions of the paper, therefore, are supported and appropriate with the most important caveats noted, but the paper would benefit from:<br /> 1) Addressing how copy number variation of MHC class I genes among primate species might have affected their analyses and results (only single representative genes of the class II MHC, which also exhibit copy number variation, were used for this study).<br /> 2) Considering the differences between class I and class II MHC roles in immune function and how those might relate to the observed patterns.

    1. eLife Assessment

      The useful manuscript presents interesting findings in the field of neurodegenerative diseases by highlighting the dual role of phosphorylated ubiquitin (pUb) in cellular proteostasis and neurotoxicity. However, some claims for discovery are supported by unconvincing and incomplete evidence that requires further validation. The poor quality of key immunofluorescent images and questionable quantification analysis raise technical concerns.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further considered.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.<br /> (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Weaknesses:

      (1) PINK1 has been reported as a kinase capable of phosphorylating Ubiquitin, hence the expected outcome of increased p-Ub levels upon PINK1 overexpression. Figures 5E-F do not demonstrate a significant increase in Ub levels upon overexpression of PINK1 alone, whereas the evident increase in Ub expression upon overexpression of S65A is apparent. Therefore, the notion that increased Ub phosphorylation leads to protein aggregation in mouse hippocampal neurons is not yet convincingly supported.<br /> (2) The specificity of PINK1 and p-Ub antibodies requires further validation, as a series of literature indicate that the expression of the PINK1 protein is relatively low and difficult to detect under physiological conditions.<br /> (3) In Figure 6, relying solely on Western blot staining and golgi staining under high magnification is insufficient to prove the impact of PINK1 overexpression on neuronal integrity and cognitive function. The authors should supplement their findings with immunostaining results for MAP2 or NeuN to demonstrate whether neuronal cells are affected.<br /> (4) The authors should provide more detailed figure captions to facilitate the understanding of the results depicted in the figures.<br /> (5) While the study proposes that pUb promotes neurodegeneration by affecting proteasomal function, the specific molecular mechanisms and signaling pathways remain to be elucidated.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript makes the claim that pUb is elevated in a number of degenerative conditions including Alzheimer's Disease and cerebral ischaemia. Some of this is based on antibody staining which is poorly controlled and difficult to accept at this point. They confirm previous results that a cytosolic form of PINK1 accumulates following proteasome inhibition and that this can be active. Accumulation of pUb is proposed to interfere with proteostasis through inhibition of the proteasome. Much of the data relies on over-expression and there is little support for this reflecting physiological mechanisms.

      Weaknesses:

      The manuscript is poorly written. I appreciate this may be difficult in a non-native tongue, but felt that many of the problems are organisational. Less data of higher quality, better controls and incision would be preferable. Overall the referencing of past work is lamentable.<br /> Methods are also very poor and difficult to follow.

      Until technical issues are addressed I think this would represent an unreliable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to explore the role of phosphorylated ubiquitin (pUb) in proteostasis and its impact on neurodegeneration. By employing a combination of molecular, cellular, and in vivo approaches, the authors demonstrate that elevated pUb levels contribute to both protective and neurotoxic effects, depending on the context. The research integrates proteasomal inhibition, mitochondrial dysfunction, and protein aggregation, providing new insights into the pathology of neurodegenerative diseases.

      Strengths:

      - The integration of proteomics, molecular biology, and animal models provides comprehensive insights.<br /> - The use of phospho-null and phospho-mimetic ubiquitin mutants elegantly demonstrates the dual effects of pUb.<br /> - Data on behavioral changes and cognitive impairments establish a clear link between cellular mechanisms and functional outcomes.

      Weaknesses:

      - While the study discusses the reciprocal relationship between proteasomal inhibition and pUb elevation, causality remains partially inferred.<br /> - The role of alternative pathways, such as autophagy, in compensating for proteasomal dysfunction is underexplored.<br /> - The immunofluorescence images in Figure 1A-D lack clarity and transparency. It is not clear whether the images represent human brain tissue, mouse brain tissue, or cultured cells. Additionally, the DAPI staining is not well-defined, making it difficult to discern cell nuclei or staging. To address these issues, lower-magnification images that clearly show the brain region should be provided, along with improved DAPI staining for better visualization. Furthermore, the Results section and Figure legends should explicitly indicate which brain region is being presented. These concerns raise questions about the reliability of the reported pUb levels in AD, which is a critical aspect of the study's findings.<br /> - Figure 4B should also indicate which brain region is being presented.

    5. Author response:

      Public Reviews:<br /> Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further considered.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.

      (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Weaknesses:

      (1) PINK1 has been reported as a kinase capable of phosphorylating Ubiquitin, hence the expected outcome of increased p-Ub levels upon PINK1 overexpression. Figures 5E-F do not demonstrate a significant increase in Ub levels upon overexpression of PINK1 alone, whereas the evident increase in Ub expression upon overexpression of S65A is apparent. Therefore, the notion that increased Ub phosphorylation leads to protein aggregation in mouse hippocampal neurons is not yet convincingly supported.

      Indeed, overexpression of sPINK1* alone caused little change in Ub levels in the soluble fraction (Figure 5E), which is expected. Ub in the soluble fraction is in a relatively stable, buffered state. However, overexpression of sPINK1* resulted in an increase in Ub levels in the insoluble fraction, indicating protein aggregation. The molecular weight of Ub in the insoluble fraction was predominantly below 70 kDa, implying that phosphorylation inhibits Ub chain elongation.

      To further examine this, we used the Ub/S65A mutant to antagonize Ub phosphorylation, and found that the aggregation at low molecular weight was significantly reduced, indicating a partial restoration of proteasomal activity. The increase in Ub levels in both the soluble and insoluble fractions likely results from the high rate of ubiquitination driven by the elevated levels of Ub. Notably, the overexpressed Ub/S65A was detected in the Western blot using the wild-type Ub antibody, which accounts for the apparently increased Ub level.

      When overexpressing Ub/S65E, we again saw an increase in Ub levels in the insoluble fraction (but no increase in the soluble fraction), with low molecular weight bands even more prominent than those observed with sPINK1* transfection. These findings collectively support the conclusion that sPINK1* promotes protein aggregation through Ub phosphorylation.

      (2) The specificity of PINK1 and p-Ub antibodies requires further validation, as a series of literature indicate that the expression of the PINK1 protein is relatively low and difficult to detect under physiological conditions.

      We acknowledge the challenges in achieving optimal specificity for commercially available and custom-generated antibodies targeting PINK1 and pUb, particularly given the low endogenous levels of these proteins under physiological conditions. Despite these limitations, we observed robust immunofluorescent staining for PINK1 (Figures 1A, 1C, and 1G) and pUb (Figures 1B, 1D, and 1G) in human brain samples from Alzheimer's disease (AD) patients, as well as in mouse brains from models of AD and cerebral ischemia. The significant elevation of PINK1 and pUb under these pathological conditions likely accounts for the clear visualization. To validate antibody specificity, we have included images from pink1-/- mice as negative controls in the revised manuscript (Figure 1C and 1D, third panel).

      In addition, we detected a significant increase in pUb levels in aged mouse brains compared to young ones (Figures 1E and 1F). Notably, in pink1-/- mice, pUb levels remained unchanged between young and aged groups, despite some background signal, further supporting the conclusion that pUb accumulation during aging is PINK1-dependent.

      In HEK293 cells, pink1-/- cells served as a negative control for PINK1 (Figure 2B and 2C) and for pUb (Figure 2D and 2E). While the Western blot using the pUb antibody displayed some nonspecific background, pUb levels in pink1-/- cells remained unchanged across all MG132 treatment conditions (Figures 2D and 2E), further attesting the reliability of our findings.

      (3) In Figure 6, relying solely on Western blot staining and Golgi staining under high magnification is insufficient to prove the impact of PINK1 overexpression on neuronal integrity and cognitive function. The authors should supplement their findings with immunostaining results for MAP2 or NeuN to demonstrate whether neuronal cells are affected.

      Thank you for raising this important point. We included NeuN immunofluorescent staining in Figure 5—figure supplement 2 of the original manuscript. The results demonstrate a significant loss of NeuN-positive cells in the hippocampus following Ub/S65E overexpression, while no apparent change in NeuN-positive cells was observed with sPINK1* transfection alone. These findings provide evidence of neuronal loss in response to Ub/S65E, further supporting the impact of pUb elevation on neuronal integrity.

      While we did not perform MAP2 immunostaining, we included complementary analyses to assess neuronal integrity. Specifically, we performed Western blotting to determine MAP2 protein levels and used Golgi staining to study neuronal morphology and synaptic structure in greater detail. These analyses revealed that overexpression of sPINK1* or Ub/S65E decreased MAP2 levels and caused damage to synaptic structures (Figures 6F and 6H). Importantly, the deleterious effects of sPINK1* overexpression could be rescued by co-expression of Ub/S65A, further underscoring the role of pUb in mediating these changes.

      Together, our NeuN immunostaining, MAP2 analysis, and Golgi staining provide strong evidence for the impact of PINK1 overexpression and pUb elevation on neuronal integrity and synaptic health. We believe these complementary approaches sufficiently address the reviewer’s concern and highlight the pathological consequences of elevated pUb levels.

      (4) The authors should provide more detailed figure captions to facilitate the understanding of the results depicted in the figures.

      Figure captions will be updated with more details in the revised manuscript.

      (5) While the study proposes that pUb promotes neurodegeneration by affecting proteasomal function, the specific molecular mechanisms and signaling pathways remain to be elucidated.

      The specific molecular mechanisms and signaling pathways through which pUb promotes neurodegeneration are likely multifaceted and interconnected. Mitochondrial dysfunction appears to be a central contributor to neurodegeneration following sPINK1* overexpression. This is supported by (1) an observed increase in full-length PINK1, indicative of impaired mitochondrial quality control, and (2) proteomic data revealing enhanced mitophagy at 30 days post-transfection and substantial mitochondrial injury by 70 days post-transfection. The progressive damage to mitochondria caused by protein aggregates can cause further neuronal injury and degeneration.

      In addition, reduced proteasomal activity may result in the accumulation of inhibitory proteins that are normally degraded by the ubiquitin-proteasome system. Our proteomics analysis identified a >54-fold increase in CamK2n1 (UniProt ID: Q6QWF9), an endogenous inhibitor of CaMKII activation, following sPINK1* overexpression. This is particularly significant because the accumulation of CamK2n1 could suppress CaMKII activation and, subsequently, inhibit the CREB signaling pathway (illustrated below). As CREB is essential for synaptic plasticity and neuronal survival, its inhibition may further amplify neurodegenerative processes.

      While our study identifies proteasomal dysfunction and mitochondrial damage as key initial triggers, downstream effects—such as disruptions in signaling pathways like CaMKII-CREB—likely contribute to a broader cascade of pathological events. These findings highlight the complexity of pUb-mediated neurodegeneration and suggest that further exploration of downstream mechanisms is necessary to fully elucidate the pathways involved.

      We plan to include the proteomics data, in the revised manuscript, of mouse brain tissues at 30 days and 70 days post-transfection, to further highlight this downstream effect upon proteasomal dysfunction.

      Author response image 1.

      Reviewer #2 (Public review):

      Summary:

      The manuscript makes the claim that pUb is elevated in a number of degenerative conditions including Alzheimer's Disease and cerebral ischemia. Some of this is based on antibody staining which is poorly controlled and difficult to accept at this point. They confirm previous results that a cytosolic form of PINK1 accumulates following proteasome inhibition and that this can be active. Accumulation of pUb is proposed to interfere with proteostasis through inhibition of the proteasome. Much of the data relies on over-expression and there is little support for this reflecting physiological mechanisms.

      Weaknesses:

      The manuscript is poorly written. I appreciate this may be difficult in a non-native tongue, but felt that many of the problems are organisational. Less data of higher quality, better controls and incision would be preferable. Overall the referencing of past work is lamentable.

      Methods are also very poor and difficult to follow.<br /> Until technical issues are addressed I think this would represent an unreliable contribution to the field.

      (1) Antibody specificity and detection under pathological conditions

      We acknowledge the limitations of commercially available antibodies for detecting PINK1 and pUb. Despite these challenges, our findings demonstrate a significant increase in PINK1 and pUb levels under pathological conditions, such as Alzheimer's disease (AD) and ischemia. Additionally, we observed an increase in pUb level during brain aging, further highlighting its relevance in this particular physiological process. To ensure reliable quantification of PINK1 and pUb levels, we used pink1-/- mice and HEK293 cells as negative controls. For example, PINK1 levels were extremely low in control cells but increased dramatically after 2 hours of oxygen-glucose deprivation (OGD) and 6 hours of reperfusion (Figure 1H). Together, these controls validate that the observed elevations in PINK1 and pUb are specific and linked to pathological or certain physiological conditions.

      (2)  Overexpression as a model for pathological conditions

      To investigate whether the inhibitory effects of sPINK1* on the ubiquitin-proteasome system (UPS) are dependent on its kinase activity, we utilized a kinase-dead version of sPINK1* as a negative control. Since PINK1 has multiple substrates, we further explored whether its effects on UPS inhibition were mediated specifically by ubiquitin phosphorylation. For this, we used Ub/S65A (a phospho-null mutant) to antagonize Ub phosphorylation by sPINK1*, and Ub/S65E (a phospho-mimetic mutant) to mimic phosphorylated Ub. These well-defined controls ensured the robustness of our conclusions.

      While overexpression does not perfectly replicate physiological conditions, it serves as a valuable model for studying pathological scenarios such as neurodegeneration and brain aging, where pUb levels are known to increase. For example, we observed a 30.4% increase in pUb levels in aged mouse brains compared to young brains (Figure 1F). Similarly, in our sPINK1* overexpression model, pUb levels increased by 43.8% and 59.9% at 30- and 70-days post-transfection, respectively, compared to controls (Figures 5A and 5C). Notably, co-expression of sPINK1* with Ub/S65A almost entirely prevented sPINK1* accumulation (Figure 5B), indicating that an active UPS can efficiently degrade sPINK1*. Collectively, these findings show that sPINK1* accumulation inhibits UPS activity, a defect that can be rescued by the phospho-null Ub mutant. Thus, this overexpression model closely mimics pathological conditions and offers valuable insights into pUb-mediated proteasomal dysfunction.

      (3) Organization of the manuscript

      We believe the structure of the manuscript is justified and systematically addresses the key aspects of the study in a logic flow:

      (a) Evidence for the increase of PINK1 and pUb in multiple pathological and physiological conditions.

      (b) Identification of the sources and consequences of sPINK1 and pUb elevation.

      (c) Mechanistic insights into how pUb inhibits UPS-mediated degradation.

      (d) Validation of these findings using pink1-/- mice and cells.

      (e) Evidence of the reciprocal relationship between proteasomal inhibition and pUb elevation, culminating in neurodegeneration.

      (f) Demonstration of elevated pUb levels and protein aggregation in the hippocampus following sPINK1* overexpression, supported by proteomic analyses, behavioral tests, Western blotting, and Golgi staining.

      Thus, this organization provides a clear and cohesive narrative, culminating in the demonstration that sPINK1* overexpression induces hippocampal neuron degeneration.

      (4) Revisions to writing, referencing, and methodology

      We will improve the clarity and flow of the manuscript, add more references to properly acknowledge prior work, and incorporate additional details into the Methods section to enhance readability and reproducibility. These improvements should address the organizational and technical concerns raised, while strengthen the overall quality of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study aims to explore the role of phosphorylated ubiquitin (pUb) in proteostasis and its impact on neurodegeneration. By employing a combination of molecular, cellular, and in vivo approaches, the authors demonstrate that elevated pUb levels contribute to both protective and neurotoxic effects, depending on the context. The research integrates proteasomal inhibition, mitochondrial dysfunction, and protein aggregation, providing new insights into the pathology of neurodegenerative diseases.

      Strengths:

      - The integration of proteomics, molecular biology, and animal models provides comprehensive insights.

      - The use of phospho-null and phospho-mimetic ubiquitin mutants elegantly demonstrates the dual effects of pUb.

      - Data on behavioral changes and cognitive impairments establish a clear link between cellular mechanisms and functional outcomes.

      Weaknesses:

      - While the study discusses the reciprocal relationship between proteasomal inhibition and pUb elevation, causality remains partially inferred.

      The reciprocal cycle between proteasomal inhibition and pUb elevation can be initiated by various factors that impair proteasomal activity. These factors include Aβ accumulation, ATP depletion, reduced expression of proteasome components, and covalent modifications of proteasomal subunits—all well-established contributors to the progressive decline in proteasome function. Once initiated, this cycle would become self-perpetuating, with the accumulation of sPINK1 and pUb driving a feedback loop of deteriorating proteasomal activity.

      In the current study, this reciprocal relationship between sPINK1/pUb elevation and proteasomal dysfunction is depicted in Figure 4A. Our results demonstrate that increased sPINK1 or PINK1 levels, such as through overexpression, can initiate this cycle. Crucially, co-expression of Ub/S65A effectively rescues the cells from this cycle, highlighting the pivotal role of pUb in driving proteasomal inhibition and establishing causality in this relationship. At the animal level, pink1 knockout could prevent protein aggregation upon aging and cerebral ischemia (Figures 1E and 1G).

      Mitochondrial injury is a likely source of elevated PINK1 and pUb levels. A recent study showed that efficient mitophagy is necessary to prevent pUb accumulation (bioRxiv 2023.02.14.528378), suggesting that mitochondrial damage can trigger this cycle. In another study (bioRxiv 2024.07.03.601901), the authors found that mitochondrial damage could enhance PINK1 transcription, further increasing cytoplasmic PINK1 levels and exacerbating the cycle.

      - The role of alternative pathways, such as autophagy, in compensating for proteasomal dysfunction is underexplored.

      Elevated sPINK1 has been reported to enhance autophagy (Autophagy 2016, 12: 632-647), potentially compensating for the impaired UPS. One mechanism involves the phosphorylation of p62 by sPINK1, which enhances autophagy activity. In our study, we did observe increased autophagic activity upon sPINK1* overexpression, as shown in Figure 2I (middle panel, without BALA). This increased autophagy may help degrade ubiquitinated proteins induced by puromycin, partially compensating for the proteasomal dysfunction.

      This compensation might explain why protein aggregation only increased slightly, though statistically significant, at 70 days post sPINK1* transfection (Figure 5F). Additionally, we observed a slight, though statistically insignificant, increase in LC3II levels in the hippocampus of mouse brains at 70 days post sPINK1* transfection (Figure 5—figure supplement 6), further supporting the notion of autophagy activation.

      However, while autophagy may provide some compensation, its effect is likely limited. Autophagy and UPS differ significantly in their roles and mechanisms of degradation. Autophagy is a bulk degradation pathway that is generally non-selective, targeting long-lived proteins, damaged organelles, and intracellular pathogens. In contrast, the UPS is highly selective, primarily degrading short-lived regulatory proteins, misfolded proteins, and proteins tagged for degradation.

      Together, we found that sPINK1* overexpression enhanced autophagy-mediated protein degradation while simultaneously impairing UPS-mediated degradation. This suggests that while autophagy may provide partial compensation for proteasomal dysfunction, it is not sufficient to fully counterbalance the selective degradation functions of the UPS.

      - The immunofluorescence images in Figure 1A-D lack clarity and transparency. It is not clear whether the images represent human brain tissue, mouse brain tissue, or cultured cells. Additionally, the DAPI staining is not well-defined, making it difficult to discern cell nuclei or staging. To address these issues, lower-magnification images that clearly show the brain region should be provided, along with improved DAPI staining for better visualization. Furthermore, the Results section and Figure legends should explicitly indicate which brain region is being presented. These concerns raise questions about the reliability of the reported pUb levels in AD, which is a critical aspect of the study's findings.

      We will include low-magnification images in the supplementary figures of the revised manuscript to provide a broader context for the immunofluorescence data presented in Figure 1. DAPI staining at higher magnifications will also be provided to improve visualization of cell nuclei and overall tissue structure. Additionally, we will indicate the brain regions examined in the corresponding figure legends, and incorporate more details in the Results section to provide clearer descriptions of the samples and brain regions analyzed.

      The human brain samples presented in Figure 1 are from the cingulate gyrus region of Alzheimer's disease (AD) patients. Our analysis revealed that PINK1 is primarily localized within cell bodies, while pUb is more abundant around Aβ plaques, likely in nerve terminals. These additional clarifications and supplementary figures should provide greater transparency and improve the reliability of our findings.

      - Figure 4B should also indicate which brain region is being presented.

      The images were taken for layer III-IV in the neocortex of mouse brains, which information will be incorporated in the figure legend of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with two lipid compositions similar to native viral membranes. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region. The revised manuscript demonstrates that these lipid interactions are robust to alterations in membrane composition and rigidity. However, it does not address the reverse-that phospholipids known experimentally not to associate with these antibodies (if any such lipids exist) also fail to interact in MD simulations.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. These simulations recapitulate lipid binding interactions solved in published crystallographic studies but also lead to the discovery of a novel lipid binding site the authors term the "Loading Site", which could guide future experiments with this antibody.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. These CG simulations , which cannot resolve atomistic interactions, are nonetheless compelling because negative controls (ab 13h11, BSA) that should not associate with membrane indeed sample significantly less membrane.

      Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive, creative, and biologically inspired. Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The use of multiscale MD simulations allows for a detailed exploration of the system at different time and length scales. The combination of MD simulations and structural bioinformatics provides a comprehensive approach to validate the identified binding sites. Finally, the steered MD simulations offer quantitative insights into the binding strength between the membrane and bnAbs.

      While the simulations and analyses provide qualitative insights into the binding interactions, they do not offer a quantitative assessment of energetics. The coarse-grained simulations exhibit artifacts and thus require careful analysis.

      This study contributes to a deeper understanding of the molecular mechanisms underlying bnAb recognition of the HIV-1 envelope. The insights gained from this work could inform the design of more potent and broadly neutralizing antibodies.

      Recommendations for the authors:

      Reviewing Editor:

      We recommend the authors remove the figure and section related to bnAb LN01, perform additional analysis (e.g., further expanding on the differences in antibody binding in the presence or absence of antigen), and present this as a separate manuscript in a follow-up study.

      We consider the analysis of a bnAb with a transmembrane antigen and of LN01 as essential to the manuscript and novel results.  Study of LN01 provides many insights unique from the other MPER bnAbs in this study.  We agree further characterization of LN01 and bnAbs with transmembrane antigen or full-length Env are intriguing and necessary to complete the full mechanistic understanding of lipid-associated antibodies.  LN01 section in this paper is novel in the field and demonstrates the preliminary evidence motivating further work, which we agree are beyond the scope of this already long detailed study.

      Reviewer #1 (Recommendations for the authors):

      I appreciate the degree to which the authors responded to my previous points raised in the private review, including edits where I might have missed something in the manuscript or relevant literature. I imagine such a point-by-point response was quite onerous. Thank you also for balancing presentation/clarity with content/rigor considering the large information content of this manuscript; in silico results are inherently hard to present given the delicate balance between rigorous validation and novel information content. I apologize if I repeat points raised and addressed previously and commend the authors on their revised study, which is much improved in clarity; any additional revisions are of course entirely at your discretion.

      "...now having more diversity in lipid headgroup chemistries" references the wrong figure-it should be: Figure 2-figure supplement 2A-C. The incorrect figure is also referenced again several sentences down: "...relevant CDR and framework surface loops..."

      Thank you for pointing out this error. We have corrected figure references.

      "One shared conformational difference observed for these bnAbs the higher cholesterol bilayers was slightly more extensive and broader interaction profiles as well as modestly deeper embedding of the relevant CDR and framework surfaces loops" please rephrase

      Thank you for this suggestion.  We rephrased this for improved clarity and flow. 

      "These results bolster the feasibility for using all-atom MD as an in silico platform to explore differential phospholipid affinity at these sites (i.e., specificity studies) and influence on antibody preferred conformation as membrane composition and lipid chemistry are systematically varied" Please tone down these speculations-you have demonstrated that simulations are robust to different headgroup chemistries but have not provided evidence for the exclusion of lipids that are known not to associate with these antibodies.

      We rephrased this speculation to highlight the potential of this application. We also emphasize future studies that would be required to achieve this application in the following sentence.

      “These results motivate use of all-atom MD as an in silico approach for exploring differential phospholipid affinity at these sites…”

      Figure 2A: Specify which PDB entry corresponds to the displayed crystal structures in the main figure or caption.

      We clarified these PDB entries in the figure caption. 

      Check reference formatting in supplemental figures when generating VOR.

      I am not sure how relevant this might be to the claims of Figure 2-figure supplement 3, but AlphaFold3-based phospholigand docking might provide an additional orthogonal approach if relevant ligand(s) are available for such analysis (particularly for the newly proposed 10E8 POPC complex).

      Thank you for this suggestion.  AI/ML based prediction methods like AF3 and RoseTTAFold All-Atom (RFAA) are interesting new methods that have come since our initial submission.   We’ve decided these experiments are beyond the scope of this already long and detailed study. We have added a sentence suggesting use of these methods in future work.

      "We next studied bnAb LN01 to interrogate differences" --> this transition still reads a bit unclear. Why shift gears and change antibodies? Also, while you do go into its interactions both +/- antigen, there's no lead into the simulation initialization with and without antigen to guide the reader into the comparisons you will draw in the figure. Also, the order of information presentation is a bit strange, where the rationale for choosing a single monomeric helix is brought up in the middle of the paragraph instead of at the beginning of the section. In the next paragraph, it goes back to the initialization of the membrane composition again, which feels a bit disorganized-I do appreciate the unique challenge of having to weave through so much quality data! In fact, if you were to conduct simulations of membrane + antigen vs. membrane + LN01 vs. membrane + LN01 + antigen, I am tempted to say that this could be removed from this manuscript and flow better as a paper in and of itself.

      We thank the reviewer for the suggestion to improve the writing style.  We feel this section adds a lot of value to the manuscript, so we will keep it in the paper and improved the transition as well as rationale.  

      We selected to study the additional antibody LN01 and the monomeric MPER-TM antigen conformation because of the existing structural evidence available without additional creative model building.  This rationale has been updated in the new text.  

      We changd the order of information as suggested, moving the rationale for antigen fragment earlier in the paragraph followed by the background of the lipids sites from the crystal that can lead into simulation set-up.  We clarified the simulation initialization was similar for systems with and without antigen in the opening sentence of the paragraph

      "previously observed snorkeling and hydration of TM Arg686" --> Is this R696 (numbering could be different based on the particular Env)?

      Thank you for noting this typo, we have corrected the numbering.

      Potential font color issue with Figure 3-Figure supplement 1 B and part of A text-could be fixed in typesetting.

      The discussion reads very well. Is it possible to direct antibody maturation, even in an engineered context, towards membrane affinity without increasing immunogenic polyreactivity? This is mentioned very briefly and cited with ref 36, but I would be interested in the author's thoughts on this topic.

      We thank the reviewer for the insightful idea to explore in future work.  Our conclusion alludes to possibly artificially evolving membrane affinity studied by MD, as done in vitro by Nieva and co-workers.  Because the hypothetical nature, we’ve chosen not to elaborate on those ideas from this manuscript.

      Reviewer #2 (Recommendations for the authors):

      To ensure reproducibility and facilitate further research, the authors should publicly deposit the code for running the MD simulations and analyses (e.g., on GitHub) along with the underlying data used in the study (e.g., on Zenodo.org).

      We appreciate the consideration for open-source code and analysis. Representative code and simulation trajectories were uploaded to the following repositories:

      https://github.com/cmaillie98/mper_bnAbs.git

      https://zenodo.org/records/13830877

      —-

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with a lipid composition similar to the native virion. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. Additional contacts and conformational restraints imposed by ectodomain regions of the envelope glycoprotein, however, remain unaddressed-the size of such simulations likely runs into technical limitations including sampling and compute time.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive. However, given the large amount of data presented within the manuscript, the text would benefit from clearer subsections that delineate discrete mechanistic discoveries, particularly for experimentalists interested in antibody discovery and design. One area the paper does not address involves the polyreactivity associated with membrane binding antibodies-MD simulations and/or pulling velocity experiments with model membranes of different compositions, with and without model antigens, would be needed. Finally, given the challenges in initializing these simulations and their limitations, the text regarding their generalized use for discovery, rather than mechanism, could be toned down.

      Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public Review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The conclusions from the paper are mostly well supported by the simulations, however, they remain very descriptive and the key findings should be better described and validated. In particular:

      It has been shown that the lipid composition of HIV membrane is rich in cholesterol [1], which accounts for almost 50% molar ratio. The authors use a very different composition and should therefore provide a reference. It has been shown for 4E10 that the change in lipid composition affects dynamics of the binding. The robustness of the results to changes of the lipid composition should also be reported.

      The real advantage of the multiscale approach (coarse grained (CG) simulation followed by a back-mapped all atom simulation) remains unclear. In most cases, the binding mode in the CG simulations seem to be an artifact.

      The results reported in this study should be better compared to available experimental data. For example how does the approach angle compare to cryo-EM structure of the bnAbs engaging with the MPER region, e.g. [2-3]? How do these results from this study compare to previous molecular dynamics studies, e.g.[4-5]?

      References<br /> (1) Brügger, Britta, et al. "The HIV lipidome: a raft with an unusual composition." Proceedings of the National Academy of Sciences 103.8 (2006): 2641-2646.<br /> (2) Rantalainen, Kimmo, et al. "HIV-1 envelope and MPER antibody structures in lipid assemblies." Cell Reports 31.4 (2020).<br /> (3) Yang, Shuang, et al. "Dynamic HIV-1 spike motion creates vulnerability for its membrane-bound tripod to antibody attack." Nature Communications 13.1 (2022): 6393.<br /> (4) Carravilla, Pablo, et al. "The bilayer collective properties govern the interaction of an HIV-1 antibody with the viral membrane." Biophysical Journal 118.1 (2020): 44-56.<br /> (5) Pinto, Dora, et al. "Structural basis for broad HIV-1 neutralization by the MPER-specific human broadly neutralizing antibody LN01." Cell host & microbe 26.5 (2019): 623-637.

      Considering reviewer suggestions, we slightly reorganized the results section into specific sub-sections with headings and changed the order in which key results were presented to allow the subsequent analysis more accessible for readers.  Supplemental materials were redistributed into eLife format, having each supplemental item grouped to a corresponding main figure. Many slightly detail modifications were made to figures (mostly supplemental items) without changing their character, such as clearer axes labels or revised annotations within panels.

      The major additions within the results sections based on the reviews were:

      (1) An expanded the comparison between our simulation analyses to previous simulations and to existing cryo-EM structural evidence for MPER antibodies’ membrane orientation the context of full-length antigen, resulting in new supplemental figure panels.

      (2) New atomistic simulations of 10E8, PGZL1, and 4E10 evaluating the phospholipid binding predictions in a different lipid composition more closely modeling HIV membranes.

      Minor edits to the analyses and interpretations include:

      (1) Outlining the geometric components contributing to variance in substates after clustering the atomistic 10E8, 4E10, and PGZL1 simulations.

      (2) Better defining the variance and durability of membrane interactions within and across systems in the coarse grain methods section.

      (3) Removed interpretations in the original results sections regarding polyreactivity and energetics for MPER bnAbs that were not explicitly supported by data.   

      (4) More context of the prevenance of bnAb loop geometries in structural informatics section

      (5) Rationale for the choice of the continuous helix MPER-TM conformation in LN01-antigen conformations, and citations to previous gp41 TM simulations.

      (6) Removed language on the novelty of the coarse grain and steered pulling simulations as newly developed approaches; tempering the potential discriminating power and applications of those approaches, in light of their limitations.

      The discussion was revised to provide more novel context of the results within the field, including discussing direct relevance of the simulation methods for evaluating immune tolerance mechanisms and into antibody engineering.   We have shared custom scripts used for molecular dynamics analysis on github (https://github.com/cmaillie98/mper_bnAbs.git) and uploaded trajectories to a public repository hosted on Zenodo (https://zenodo.org/records/13830877).

      Recommendations for the authors:

      Below, I provide an extensive list of minor edits associated with the text and figures for the authors to consider. I provide these with the hope of increasing the accessibility of the manuscript to broader audiences but leave changes to the discretion of the authors.

      Text/clarity

      Figure 1 main text

      The main text discussing Figure 1 is disorganized, making the analysis difficult to follow. I would suggest the following: moving the sentence, "4E10 and PG2L1 are structurally homologous" immediately after the paragraph discussing the simulation initiation. Then, add a sentence that directly compares their experimental affinity, neutralization, and polyreactivity of 4E10 and PG2L1 (later, an unintroduced idea pops up, "These patterns may in part explain 4E10's greater polyreactivity"). Next, lead into the discussion of the MD simulation data with something to the effect of: "Given these similarities, we first compared mechanisms of membrane insertion between 4E10 and PG2L1 to bolster confidence in our predictions". Later, the sentence "Across 4E10 and PGZL1 simulations, the bound lipid phosphates"

      We thank the reviewer for the suggestion and we have restructured the beginning of the results to implement this style: to first introduce then discuss the comparative PGZL1 & 4E10 results, i.e. Figure 1 plus associated supplements.

      In the background and the introduction text leading up to Figure 1, CDR-H3 is discussed at length, however, the first figure focuses almost entirely on how CDR-H1 coordinates a lipid phosphate headgroup. Are there experimental mutations in this loop that do not affect affinity (e.g., to a soluble gp41 peptide), but do affect neutralization (like the WAWA mutation for CDR-H3, discussed later)?

      We have altered the Introduction (para 2) and Results (4E10/PGZL1 sub-section) to give more balanced discussion of CDRs H1 & H3.  That includes referencing experimental data addressing the reviewer’s question; a PGZL1 clone H4K3 where mutations to CDRH1 were introduced and shown have minimal impact on affinity to MPER peptide via ELISA and BLI, but those mutant bnAbs had significantly reduced neutralization efficacy (PMC6879610).

      The sentence "These phospholipid binding events were highly stable, typically persisting for hundreds of nanoseconds" should be moved down to immediately precede, "[However], in a PGZL1 simulation, we observed a". This would be a good place for a paragraph break following, "Thus, these bnABs constitutively", since this block of text is very long.

      Similarly, the sentence and parts of the section, "Likewise, the interactions coordinating the lipid phosphate oxygens at CDR-H1" more appropriately belongs immediately before or after the sentence, "Our simulations uncover the CDR-lipid interactions that are the most feasible".

      Thank you for the detailed guidance in reorganizing the Figure 1 results.  We followed the advice to directly compare 4E10 and PGZL1 results separately from 10E8, moving those sections of text appropriately.  New paragraph breaks were added to improve accessibility and flow of concepts throughout the Results.

      In the sentence, "our simulations uncover CDR-lipid interactions that are the most feasible and biologically relevant in the context of a full [HIV] lipid bilayer... validation to which of the many possible ions" à have you confidently determined lipid binding and positioning outside of the site validated in figure 1? Which site(s) are these referencing? The next two sentences then introduce two new ideas on the loop backbone stability then lead into lipid exchange, which is a bit jarring.

      We have adjusted the language concerning the putative ions/lipids electron density across the many PGZL1 and 4E10 crystal structures, and additionally make the explicit point that we confidently determined the lack of lipid binding outside of the site focused on in Figure 1.

      “… both bnAbs showed strong hotspots for a lipid phosphate bound within the CDR-H1 loops, with minimal phospholipid or cholesterol ordering around the proteins elsewhere.  The simulated lipid phosphates bound within CDR-H1 have exceptional overlap with electron densities and atomic details of modelled headgroups from respective lipid-soaked co-crystal structures…”

      Figure 2 main text

      "We similarly investigated bnAb 10E8" - Please make this a separate subheader, the block text is very long up to this point.

      Thank you for the suggestion. We introduced a sub-header to separate work on 10E8 all-atom simulations.

      "we observed a POPC complexed with... modelled as headgroup phosphoglycerol anions..." - please cite the references within the text.

      Thank you for pointing out this missing reference, we added the appropriate reference.

      "One striking and novel observation" - please remove the phrase "striking" throughout, for following best practices in scientific writing (PMC10212555)-this is generally well-done throughout.

      We removed “striking” from our text per your suggestion.

      "This CDR-L1 site highlights... (>500 fold) across HIV strains" - How much do R29 and Y32 also contribute to antigen binding and the conformation of this loop? These mutants also decreased Kd by approximately 20X, and based on the co-crystal structure with the TM antigen (PDB: 4XCC), seem to play a more direct role in antigen contact. Additionally, these residues should be highlighted on a figure, otherwise it's difficult to understand why they are important for membrane association.

      We thank the reviewer for deep engagement to these supporting experimental details.  The R29A+Y32A 10E8 mutant referenced in the text showed only 4-fold Kd increase, a modest change for an SPR binding experiment.  Whereas R29E+Y32E 10E8 mutant resulted in 40x Kd increase, the “20x” the reviewer refers to.  Both 10E8 mutants showed similar drastically reduced breadth and potency of over 2 orders of magnitude on average.

      These mutated CDR-L1 residues are not directly involved in antigen contact and adopt the same loop helix conformation when antigen is bound.  A minor impact on antigen binding affinity could be due altering pre-organization of CDR loops upon losing interactions from the Tyr & Arg sidechains - particularly Tyr31 in contact with CDR-H3.

      As per the suggestion, clearer annotated figure panel denoting these sidechains has been added to Figure 2-Figure Supplement 1 for 10E8 analysis.

      "Structural searches querying... identified between 10^5 and 2*10^6..." - why is this value represented as such a large range? Does this depend on the parameters used for analysis? Please clarify.

      Additionally, how prevalent are any random loop conformations compared to the ones you searched? It's otherwise difficult to attribute number of occurrences within the 2 A cutoff to biological significance, as this number is not put in context.

      We appreciate the reviewers comment to contextualize the range and relative frequency of the bnAb loop conformations.   RMSD and length of loop are the key parameters, which can be controlled by searching reference loops of similar length.  The main point of the backbone-level searching is simply to imply the bnAb loops are not particularly rare when comparing loops of similar length.   

      We did as was suggested and added comparison to random loops of the same length to the main text, including a new Supplementary Table 4.   

      “…identified between 105 to 2∙106 geometrically similar sub-segments within natural proteins (<2 Å RMSD)40, reflecting they are relatively prevalent (not rare) in the protein universe, comparing well with frequency of other surface loops of similar length in antibodies (Supplementary Table 3).”

      "We next examined the geometries" could start after its own new subheading. Moreover, while there's an emphasis on tilt for neutralization, there is not a figure clearly modelling the proposed Env tilt compared to the relatively planar bilayer. It would be helpful to have an additional panel somewhere that shows the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for outlining an interesting element to consider in our analysis of a multi-step binding mechanism for MPER antibodies. We added additional figure panels in the supplement to outline the similarities and differences between our simulations and Fabs with the inferred membranes in cryo-EM experiments of full-length HIV Env.  The simulated Fabs’ angles are very similar with only minor tilting to match the cryo-EM antibody-membrane geometries. 

      We added Figure 1-figure supplement 1A & Figure 2-figure supplement 2A, and alter to text to reflect this:

      “The primary difference is Env-bound Fabs in cryo-EM adopt slightly more shallow approach angles (~15_°_) relative to the bilayer normal.  The simulated bnAbs in isolation prefer orientations slightly more upright, but presenting CDRs at approximately the same depth and orientation.  Thus, these bnAbs appear pre-disposed in their membrane surface conformations, needing only a minor tilt to form the membrane-antibody-antigen neutralization complex.”   

      Env tilt dynamics and membrane curvature of natural virions may reconcile some of these differences.  Recent in situ tomography of Full-length Env in pseudo-virions corroborates our approximation of flat bilayers over the short length scales around Env.

      The sentence "we next examined the geometries" mentions "potential energy cost, if any, for reorienting...". However, there's no further discussions of geometry or energy cost within this section. Please rephrase, or move this figure to main and increase discussion associated with the various conformational ensembles, their geometry, and their phospholipid association.

      As the reviewer highlights, the unbiased simulations and our analysis do not explicitly evaluate energetics.  We removed this phrase, and now only allude to the minimal energy barrier between the similar geometric conformations, relative to the tilting & access requirements for antigen binding mechanism.

      “The apparent barrier for re-orientation is likely much less energetically constraining than shielding glycans and accessibility of MPER”

      ".. describing the spectrum of surface-bound conformations" cites the wrong figure.

      Thank you for noticing this error; we correct the figure reference to (Figure 2-figure supplement 4).

      Please comment on the significance of how global clustering (Fig. S5A-C) was similar for 4E10 and PGZL1, but different for 10E8 (e.g., blue, orange, and yellow clusters for 4E10 and PHZL1 versus cyan, red, and green clusters for 10E8). As the cyan cluster seems to be much closer in Euclidian space to the 4E10/PGZL1 clusters, it might warrant additional analysis. What do these clusters represent in terms of structure/conformation? How do these clusters differ in membrane insertion as in (A)?

      We are grateful you identify analysis in the geometric clustering section that may be of interest to other readers. We have added additional supplementary table (Table 2) to detail the CDR loop membrane insertion and global Fab angles which describe each cluster, to demonstrate their similarities and differences.  We also better describe how global clustering was similar for 4E10 and PGZL1, but different for 10E8 in the relevant results section<br /> The cyan cluster is not close in structure to 4E10/PGZL1 clusters.  We note the original figure panel had an error.  The updated Figure 2-supplement 4B shows the correct Euclidian distance hierarchy with an early split between 4e10/pgzl1 and 10e8 clusters.

      Figure 3 main text

      The start of this section, "We next studied bnAb LN01...", is a good place for a new subheader.

      We have added an additional subheader here: Antigen influence on membrane bound conformations and lipid binding sites for LN01

      There should be a sentence in the main text defining the replicate setup and production MD run time. Is the apo and complex based on a published structure? How do you embed the MPER? Is the apo structure docked to membrane like in 4E10? The MD setup could also be better delineated within the methods.

      The first two paragraphs in this section have been updated to clarify the relevant simulations configuration and Fab membrane docking prediction details. 

      The procedure was the same for predicting an initial membrane insertion, albeit now we use the LN01-TM complex and the calculation will account for the membrane burial of the the TM domain and MPER fragment.  As mentioned, LN01 is predicted as inserted with CDR loops insert similarly with or without the TM-MPER fragment.  The geometry differs from PGZL1/4E10 and 10E8, denoted by the text.

      Please comment on the oligomerization state of the antigen used in the MD simulation: how does the simulation differ from a crossed MPER as observed in an MPER antibody-bound Env cryo-EM structure (PMID: 32348769), a three-helix bundle (PMC7210310), or single transmembrane helix (PMC6121722)? How does the model MPER monomer embed in the membrane compared to simulations with a trimeric MPER (PMC6035291, PMID: 33882664)-namely, key arginine residues such as R696?

      We thank the reviewer for pointing out critical underlying rationale for modeling this TM-MPER-LN01 complex which we have corrected in the revised draft. The range of potential conformations and display of MPER based on TM domain organization could easily be its own paper – we in fact have a manuscript in preparation on the topic.  

      The updated text expands the rationale for choosing the monomeric uninterrupted helix form of the MPER-TM model antigen (para 1 of LN01 section). The alternative conformations we did not to explore are called out, with references provided by the reviewer.

      The discussion qualified that the MPER presentation is likely oversimplified here, noting MPER display in the full-length Env trimer will vary in different conformational states or membrane environments. However, the only cryo-EM structures of full-length ENV with TM domains resolved have this continuous helix MPER-TM conformation – seen both within crossing TM dimers or dissociated TM monomers.

      Are there additional analyses that can validate the dynamics of the MPER monomer in the membrane and relative to LN01? Such as key contacts you would expect to maintain over the duration of the MD simulation?

      We also increased description of this TM domain’s behavior, dynamics (tilt, orientation, Arg696 snorkeling, and complex w LN01) to provide a clearer picture of the simulation results – which aligns with past MD of the gp41 TM domain as a monomer (para 2 of LN01 section).  As well, we noted key LN01-MPER contacts that were maintained.

      How does the model MPER modulate membrane properties like lipid density and lipid proximities near LN01?

      We checked and didn’t notice differences for the types of lipids (chol, etc) proximal to the MPER-TM or the CDR loops versus the bulk lipid bilayer distributions.  Due to the already long & detailed nature of this manuscript, we elect not to include discussion on this topic.

      Supplemental figure 1H-I would be better positioned as a figure 3-associated supplemental figure.

      We rearranged to follow the eLife format and have paired supplemental panels with their most relevant main figures.

      Figure 3F/H reference a "loading site" but this site is defined much later in the text, which was confusing.

      Thank you for pointing out this source of confusion, we rearranged our discussion to reflect the order in which we present data in figures.

      What evidence suggests that lipids "quickly exchange from the Loading site into the X-ray site by diffusion"? I do not gather this from Figure S1H/I.

      We have rearranged the loading side and x-ray site RMSD maps in Figure 3-Figure supplement 1 to better illustrate how a lipid exchanges between these sites.

      Figure 4 main text

      The authors assert that in the CG simulations, restraints, "[maintain] Fab tertiary and quaternary structure". However, backbone RMSD does not directly assert this claim-an additional analysis of the key interfacial residues between chains, or geometric analysis between the chains, would better support this claim.

      Thank you for pointing this point.  We rephrased to add that the major sidechain contacts between heavy and light chain persist, in addition to backbone RMSD, to describe how these Fabs maintain the fold stably in CG representation. 

      In several cases, CG models sample and then dissociate from the membrane. In the text, the authors mention, "course-grained models can distinguishing unfavorable and favorable membrane-bound conformations". Is there a particular orientation that causes/favors membrane association and dissociation? This analysis could look at conformations immediately preceding association and dissociation to give clues as to what orientation(s) favor each state.

      Thank you for suggesting this interesting analysis.  Clustering analysis of associated states are presented in Figure 5, Figure 5-Figure Supplement 1, and Figure 6, which show all CDR and framework loop directed insertion.  This feature is currently described in the main text.  

      We did not find strong correlation of specific orientations as “pre-dissociation” states or ineffective non-inserting “scanning” events.  We revised the key sentence to reflect the major take away – that non-CDR alternative conformations did not insert and most of those having CDRs inserted in a different manner than all-atom simulations also were prone to dissociate:

      “Given that non-CDR directed and alternative CDR-embedded orientations readily dissociate, we conclude that course-grained models can distinguish unfavorable and favorable membrane-bound conformations to an extent that provides utility for characterizing antibody-bilayer interaction mechanisms.”

      Figure 6 main text

      "For 4E10, trajectories initiated from all three geometries..." only two geometries are shown for each antibody. Please include all three on the plot.

      The plots include markers for all three geometries for 4E10, highlighted in stars or with letters on the density plots of angles sampled (Figure 6B,C)

      "Aligning a full-length IgG... unlikely that two Fabs simultaneously..." Are there theoretical conformations in which two Fabs could simultaneously associate with membrane? If this was physiological or could be designed rationally, could an antibody benefit further from avidity?

      Our modeling suggests the theoretical conformations having two Fabs on the membrane are infeasible.  It’s even less likely multiple Env antigens could be engaged by one IgG.  We have revised the text to express this more clearly.

      Figure 7 main text

      "An intermediate... showed a modest reduction in affinity..." what affinity does PGZL1 have for this antigen?

      The preceding sentence for this information: “Mature PGZL1 has relatively high affinity to the MPER epitope peptide (Kd = 10 nM) and demonstrates great breadth and potency, neutralizing 84% of a 130 strain panel “

      Figures

      Figure 1

      It would be helpful to have an additional panel at the top of this figure further zoomed out showing the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for the suggestion to include this analysis.  We have added to the text reflecting this information, as well as making new supplemental panels for 4E10 and 10E8 that we compare simulated 4E10 and 10E8 Fab conformations to cryoEM density maps with Fabs bound to full-length HIV Env. Figure 1-figure supplement 1A & Figure 2-figure supplement 2A

      In Figure 1, space permitting, it would be helpful to annotate the distances between the phosphates and side chains (similarly, for Figure S1A).

      To avoid the overloading the Main figure panels with text, those relevant distances are listed in the methods sections.  Those distances are used to define the “bound” lipid phosphate state.  Generally, we note the interactions are within hydrogen bonding distance.

      Annotating "Replicate 1" and "Replicate 2" on the left side of Figure 1C/D would make this figure immediately intuitive.

      We have added these labels.

      Figure caption 1C: Please clarify the threshold/definition of a contact used to binarize "bound" versus "unbound" (for example, "mean distance cutoff of 2A between the phosphate oxygen and the COM of CDR-H1") [on further reading of the methods section, this criterion is quite involved and might benefit from: a sentence that includes "see methods"]. Additionally, C could use a sentence explaining the bar such as in E, "Phosphate binding is mapped to above each MD trajectory" Please define FR-H3 in the figure caption for E/F.

      We have added these details to the figure caption.

      Because Figure 1 is aggregated simulation time, it would be helpful to also represent the data as individual replicates or incorporate this information to calculate standard deviations/statistics (e.g., 1 microsecond max using the replicates to compute a standard deviation).

      We believe the current quantification & display of data via sharing all trajectories is sufficient to convey the major point for how often each CDR-phosholipid binding site it occupied.  Further tracking and statistics of inter-atomic distances will likely be too tedious & add minimal value. There is some dynamics of the phosphate oxygens between the polar within the CDR site but our “bound” state definitions sufficiently describe the key participating interactions are made.

      Figure 2

      For A, it would be helpful to annotate the yellow and blue mesh on the figure itself.

      We have defined the orange phosphate and blue choline densities.

      Also, where are R29 and Y32 relative to this site? In the X-ray panels, Y38 is not shown, and the box delineating the zoom-in is almost imperceptible.

      Thank you for this suggestion to include those amino acids which are referenced in the text as critical sites where mutation impacts function. To clarify, Y32 is the pdb numbering for residue Y38 in IMGT numbering. We have added a panel to Figure 2-Figure Supplement 1 having a cartoon graphic of 10E8 loop groove with sidechains & annotating R29 and Y38, staying consistent with out use of IMGT numbering in the manuscript.

      Figure 3

      It might read clearer to have "LN01+MPER-TM" and "LN01-Apo" in the middle of A/B and C/D, respectively, and a dotted line delineating the left and right side of the figure panels.

      We have added these details to the figure for clarity for readers.

      It would be helpful to show some critical interactions that are discussed in the text, such as the salt bridge with K31, by labeling these on the figure (e.g., in E-H).

      We drafted figure panels with dashed lines to indicate those key interactions.  However, they became almost imperceptible and overloaded with annotations that distracted from the overall details.  For K31, the interaction occurs in LN01 crystal structures readers can refer to.

      Why are axes cut off for J?

      We corrected this.

      Please re-define K/L plots as in Figure 1, and explain abbreviations.

      We updated the figure caption to reflect these changes.

      Figure 4

      The caption for panel A states that the Fab begins in solvent 1-2 nm above the bilayer, but the main text states 0.5-2 nm.

      We have reconciled this difference and listed the correct distances: 0.5-2nm.

      Please label the y-axis as "Replicate" for relevant figure panels so that they are more immediately interpretable.

      This label has been added.

      A legend with "membrane-associated" and "non-associated" within the figure would be helpful. Additionally, the average percent membrane associated, with a standard deviation, should be shown (Similar to 1C, albeit with the statistics).

      This legend has been added.  We also added the additional statistical metrics requested to strengthen our analysis.

      The text references "10, 14, and 12 extended insertion events" for the three antibody-based simulations. How do you define "extended insertion events"? Would breaking this into average insertion time and standard deviation better highlight the association differences between MPER antibodies and controls, in addition to the variability due to difference random initialization?

      We thank the reviewer for the insightful suggestion on how to better organize quantitative analysis to support the method. Supplemental Table 3 includes these numbers.

      Figure 5

      The analysis in Fig. S6C could be included here as a main figure.

      The drafted revised figure adding S6C to Figure 5 made for too much information.  Likewise, putting this panel S6C separated it from the parent clustering data of S6B, so we decided to keep these figures separated.  The S6 figure is now Figure 5-figure supplement 1.

      Figure 6

      Please annotate membrane insertion on E as %.

      These are phosphate binding RMSD/occupancy vs time.  The panels are now too small to annotate by %.  The qualitative presentation is sufficient at this stage.  The quantitative % are listed in-line within text when relevant to support assertions made. 

      Please use the figure caption to explain why certain clusters (e.g., 10E8 cluster A, artifact, Fig. S6E) are not included in panel E.

      We have added this information in the figure caption.

      Figure 7

      Please show all points on the box and whisker plots (panels E and F), and perform appropriate statistical tests to see if means are significantly different (these are mentioned in the text, but should be annotated on the graph and mentioned within the figure caption).

      We have changed these plots to show all data points along with relevant statistical comparisons. The figure captions describe unpaired t-test statistical tests used.

      Figure S1

      G, H, and I do not belong here-they should be moved to accompany their relevant text section, which associates with Figure 3. It would be helpful to associate this with Figure 3 in the eLife format, "Figure 3-Supplemental Figure 1" or its equivalent.

      It's very difficult to distinguish the green and blue circles on panel G.

      We darkened the shading and added outline for better visualization

      Subfigure I is missing a caption, could be included with H: "(H,I) Additional replicates for LN01+TM (H) and LN01 (I)".

      We corrected this as suggested.

      Why is H only 3 simulations and not 4? Does it not have a lipid in the x-ray site? Also, the caption states "(top, green)" and "(bottom, cyan)", but the green vs. cyan figures are organized on the left and right. Additional labels within the figure would help make this more intuitive.

      If the point of H and I is to illustrate that POPC exchanges between the X-ray and loading sites, this is unclear from the figure. Consider clarifying these figures.

      Thank you for describing the confusion in this figure, we have added labels to clarify.

      Figure S2 (panels split between revised Figure 4 associated figure supplements)

      The LN01 figures should likely follow later so that they can associate with Figure 3, despite being a similar analysis.

      We corrected supplements to eLife format so supplements are associated with relevant main figures.

      Figure S3 (panels split between revised Figure 1 & 2 associated figure supplements)

      As hydrophobicity is discussed as a driving factor for residue insertion, it would be helpful to have a rolling hydrophobicity chart underneath each plot to make this claim obvious.

      We prefer the current format, due to the worry of having too much information in these already data-rich panels.  As well, residues are not apolar but are deeply inserted.

      Figure S4 (panels split between revised Figure 1 & 2 associated figure supplements)

      It would be helpful to label the relevant loops on these figures.

      We have labeled loops for clarity.

      Do any of these loops have minor contacts with Env in the structure?

      The 4E10 and PGZL1 CDRH-1 loop does not directly contact bound MPER peptides bound in crystal structures. 

      FRL-3 and CDR-H1 in 10E8 do not contact the MPER peptide antigen component based on x-ray crystal structures.

      Do motif contacts with lipid involve minor contacts with additional loops other than those displayed in this figure?

      The phosphate-loop interactions in motifs used as query bait here are mediated solely by the backbone and side chain interactions of the loops displayed. We visually inspected most matches and did not see any “consensus” additional peripheral interactions common across each potential instance in the unrelated proteins.  The supplied Supplemental Table 2 contains the information if a reader wanted to conduct a detailed search. 

      Why is there such a difference between the loop conformation adopted in the X-ray structure and that in the MD simulation, and why does this lead to the large observed differences in ligand-binding structure matches?

      We thank the reviewer for carefully noting our error in labeling of CDR loop and framework region input queries. We revised the labeling to clarify the issue.

      The is minimal structural difference between the loops in x-ray and MD.

      Figure S5 (Figure 2-Figure supplement 4)

      This figure is not colorblind friendly-it would be helpful to change to such a pallet as the data are interesting, but uninterpretable to some.

      We have left this figure the same.

      "Susbstates" - "Substates"

      Corrected, thank you.

      Panel B is uninterpretable-please break the axis so that the Euclidian distances can be represented accurately but the histograms can be interpreted.

      We have adjusted axis for this plot to better illustrate the cluster thresholds.

      The clusters in D-H should be analyzed in greater depth. What is the structural relevance of these clusters other than differences in phospholipid occupancy in (I)? Snapshots of representative poses for each cluster could help clarify these differences.

      We have adjusted the text to describe the geometric differences in each of those clusters that result in the different exceptionally lower propensities for forming the key phospholipid interaction.  

      The figure caption should make it clear that 3 μS of aggregate simulation time is being used here instead of 4 μS to start with unique tilt initializations. E.g., "unique starting membrane-bound conformations (0 degrees, -15 degrees, 15 degrees initialization relative to the docked pose)". Further, why was the particular 0-degree replicate chosen while the other was thrown out? Or was this information averaged? Why is the full 4 μS then used for D-I?

      We thank the reviewer for noting these details.  We didn’t want to bias the differential between 10E8 and 4E10/PGZL1 by including the replicate simulations.  The analysis was mainly intended to achieve more coarse resolution distinction between 10E8 and the similar PGZL1/4E10.  

      In the subsequent clustering of individual bnAb simulation groups, the replicate 0 degree simulations had sufficiently different geometric sampling and unique lipid binding behavior that we though it should be used (4 us total) to achieve finer conformational resolution for each bnAb.

      Figure S6 (now Figure 5-Figure Supplement 1)

      Please label the CDRs in C and provide a color key like in other figures. Also, please label the y-axes. This figure could move to main below 5B with the clusters "A,B,C" labeled on 5B.

      We have added the axes labels and color key legend.  We retained a minimal CDR loop labeling scheme for the more throughput interaction profiles here where colored sections in the residue axes denote CDR loop regions.

      Figure S7 (Figure 7 Figure Supplement 1)

      Panels A and B would likely read better if swapped.

      We have swapped these panels for a better flow.

      For panel C, please display mean and standard deviation, and compare these values with an appropriate statistical test.

      This is already displayed in main figure, we have removed it from supplement.

      For E and F, please clarify from which trajectory(s) you are extracting this conformation from. Are these the global mean/representative poses? How do they compare to other geometrically distinct clusters?

      The requested information was added to supplemental figure caption.  These are frames from 2 distinct time points selected phosphate bound frames from 0-degree tilt replicates for both 4E10 and 10E8, representing at least 2 distinct macroscopic substates differing in global light chain and heavy chain orientation towards the membrane. 

      Table S2 (now Supplementary Table 3)

      Please add details for the 13h11 simulation.

      Additionally, please add average contact time and their standard deviation to the table, rather than just the aggregated total time. This will highlight the variability associated with the random initializations of each simulation.

      We have added the details for 13h11 and the requested analysis (average aggregated time +/- standard deviation and average time per association event +- standard deviation) to supplement our summary statistics for this method.

      Reviewer #2 (Recommendations For The Authors):

      (1) The structure of the manuscript should be improved. For example, almost half of the introduction (three paragraphs) summarize the results. I found it hard to navigate all the data and specific interactions described in the result section. Furthermore, the claims at the end of several sections seem unsupported. Especially for the generalization of the approach. This should be moved to the discussion section. The discussion is pretty general and does not provide much context to the results presented in this study.

      We have significantly reorganized the results section to improve the flow of the manuscript and accessibility for readers, especially the first sections of all-atom simulations. We also removed claims not directly supported by data from our results, and expanded on some of these concepts in the discussion to make some more novel context to the result.

      (2) The author should cite more rigorously previous work and refrain from using the term "develop" to describe the simple use of a well established method. E.g. Several studies have investigated membrane protein interactions e.g. [1], membrane protein-bilayer self-assembly [2], steered molecular dynamics [3], etc.

      Thank you for identifying relevant work for the simulations that set precedent for our novel application to antibody-membrane interactions.  We have removed language about development of simulation methods from the text and now better reference the precedent simulation methods used here.

      (3) Have the authors considered estimating the PMF by combining the steered MD simulation through the application of Jarzynski's equality?

      We performed from preliminary PMFs for Fab-membrane binding, but saw it was taking upward of 40 us to reach convergence.  Steered simulations focus on a key lipid may be easier.

      Although PMFs are beyond the scope of this work, we added proposals & allusion to their utility as the next steps for more rigorous quantification of fab-membrane interactions.

      Minor

      (4) The term "integrative modeling" is usually used for computational pipelines which incorporate experimental data. Multiscale modeling would be more appropriate for this study.

      We altered descriptions throughout the manuscript to reflect this comment.

      (5) Units to report the force in the steered molecular dynamics are incorrect. They should be 98.

      We changed axes and results to correctly report this unit.

      (6) Labels for axes of several graphs are not missing.

      We added labels to all axes of graphs, except for a few where stacked labels can be easily interpreted to save space and reduce complexity in figures.

      (7) Figure 3 K & L is this really < 1% of total? The term "total" should also be clarified.

      Thank you for pointing this out, we changed the % labels to be correct with axes from 0-100%. We clarified total in the figure caption.

      (8) The font size in figures should be uniformized.

      This suggestion has been applied

      (9) Time needed for steered MD should be reported in CPUh and not hours (page 17).

      We removed comments on explicit time measurements for our simulations.

      (10) Version of Martini force field is missing in methods section

      We used Martini 2.6 and added this to the methods.

      References

      (1) Prunotto, Alessio, et al. "Molecular bases of the membrane association mechanism potentiating antibiotic resistance by New Delhi metallo-β-lactamase 1." ACS infectious diseases 6.10 (2020): 2719-2731.

      (2) Scott, Kathryn A., et al. "Coarse-grained MD simulations of membrane protein-bilayer self-assembly." Structure 16.4 (2008): 621-630.

      (3) Izrailev, S., et al. "Computational molecular dynamics: challenges, methods, ideas. Chapter 1. Steered molecular dynamics." (1997).

    2. eLife Assessment

      This valuable study reports multi-scale molecular dynamics simulations to investigate a class of highly potent antibodies that simultaneously engage with the HIV-1 Envelope trimer and the viral membrane. The work provides insights into how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization. After extensive revision, the level of evidence is considered solid, although a quantitative assessment of the underlying energetics remain difficult to obtain.

    3. Reviewer #1 (Public review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with two lipid compositions similar to native viral membranes. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region. The revised manuscript demonstrates that these lipid interactions are robust to alterations in membrane composition and rigidity. However, it does not address the reverse-that phospholipids known experimentally not to associate with these antibodies (if any such lipids exist) also fail to interact in MD simulations.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. These simulations recapitulate lipid binding interactions solved in published crystallographic studies but also lead to the discovery of a novel lipid binding site the authors term the "Loading Site", which could guide future experiments with this antibody.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. These CG simulations , which cannot resolve atomistic interactions, are nonetheless compelling because negative controls (ab 13h11, BSA) that should not associate with membrane indeed sample significantly less membrane.

      Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive, creative, and biologically inspired. Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

    4. Reviewer #2 (Public review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The use of multiscale MD simulations allows for a detailed exploration of the system at different time and length scales. The combination of MD simulations and structural bioinformatics provides a comprehensive approach to validate the identified binding sites. Finally, the steered MD simulations offer quantitative insights into the binding strength between the membrane and bnAbs.

      While the simulations and analyses provide qualitative insights into the binding interactions, they do not offer a quantitative assessment of energetics. The coarse-grained simulations exhibit artifacts and thus require careful analysis.

      This study contributes to a deeper understanding of the molecular mechanisms underlying bnAb recognition of the HIV-1 envelope. The insights gained from this work could inform the design of more potent and broadly neutralizing antibodies.

    1. eLife Assessment

      This valuable study presents findings on the mode of action of MOTS-c (mitochondrial open reading frame from the twelve S rRNA type-c), and its impact on monocyte-derived macrophages. The authors present solid evidence for its increased expression in stimulated monocytes/macrophages, its direct bactericidal functions, as well as its role in the modulation of monocyte differentiation into macrophages. Since most of the data were generated from a cell line (THP1), future work is required to validate observations in primary cells and to further support the claims of this work.

    2. Reviewer #1 (Public review):

      In this work, the authors examine the mechanism of action of MOTS-c and its impact on monocyte-derived macrophages. In the first part of the study, they show that MOTS-c acts as a host defense peptide with direct antibacterial activity. In the second part of the study, the authors aim to demonstrate that MOTS-c influences monocyte differentiation into macrophages via transcriptional regulation.

      Major strengths. Methods used to study the bactericidal activity of MOTS-c are appropriate and the results convincing.

      Major weaknesses. Methods used to study the impact on monocyte differentiation are inappropriate and the conclusions not fully supported by the data shown. A major issue is the use of the THP-1 cell line, a transformed monocytic line which does not mimic physiological monocyte biology. In particular, THP-1 differentiation is induced by PMA, which is a completely artificial system and conclusions from this approach cannot be generalized to monocyte differentiation. The authors would need to perform this series of experiments using freshly isolated monocytes, either from mouse or human. The read-out used for macrophage differentiation (adherence to plastic) is also not very robust, and the authors would need to analyze other parameters such as cell surface markers. It is also not clear whether MOTS-c could act in a cell-intrinsic fashion, as the authors have exposed cells to exogenous MOTS-c in all their experiments. The authors have also analyzed the transcriptomic changes induced by MOTS-c exposure in macrophages derived from young or old mice. While the results are potentially interesting, the differences observed seem independent from MOTS-c and mainly related to age, therefore the conclusions from this figure are not clear. The physiological relevance of this study is also unclear.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors examine the mechanism of action of MOTS-c and its impact on monocyte-derived macrophages. In the first part of the study, they show that MOTS-c acts as a host defense peptide with direct antibacterial activity. In the second part of the study, the authors aim to demonstrate that MOTS-c influences monocyte differentiation into macrophages via transcriptional regulation.

      Major strengths.

      Methods used to study the bactericidal activity of MOTS-c are appropriate and the results are convincing.

      Major weaknesses.

      Methods used to study the impact on monocyte differentiation are inappropriate and the conclusions are not supported by the data shown. A major issue is the use of the THP-1 cell line, a transformed monocytic line which does not mimic physiological monocyte biology. In particular, THP-1 differentiation is induced by PMA, which is a completely artificial system and conclusions from this approach cannot be generalized to monocyte differentiation. The authors would need to perform this series of experiments using freshly isolated monocytes, either from mouse or human. The read-out used for macrophage differentiation (adherence to plastic) is also not very robust, and the authors would need to analyze other parameters such as cell surface markers. It is also not clear whether MOTS-c could act in a cell-intrinsic fashion, as the authors have exposed cells to exogenous MOTS-c in all their experiments. The authors did not perform complementary experiments using MOTS-c deficient monocytes. The authors have also analyzed the transcriptomic changes induced by MOTS-c exposure in macrophages derived from young or old mice. While the results are potentially interesting, the differences observed seem independent from MOTS-c and mainly related to age, therefore the conclusions from this figure are not clear. Another concern is the reproducibility of the experiments, as the authors do not indicate the number of biological replicates analyzed nor the number of independent experiments performed.

      In this study, we employed the THP-1 cell line as a proof-of-principle to elucidate the existence of a firstin-class mitochondrial-encoded host defense peptide. This peptide is expressed in monocytes and serves dual functions: i) direct targeting of bacteria, and ii) regulation of monocyte differentiation. It is noteworthy that THP-1 cells differentiated by PMA have been widely utilized as a model for monocyte differentiation by numerous research groups.  While we acknowledge the significance of utilizing primary monocytes to fully comprehend the translational implications of our findings, conducting a complete replication of our experiments in primary monocytes falls beyond the scope of this study. However, we have conducted several pivotal experiments in primary monocytes, including:  

      i) Demonstration of the induction of endogenous MOTS-c in primary human monocytes during differentiation by M-CSF (Fig 3A).

      ii) Observation of an increased number of adhered monocytes during monocyte differentiation following MOTS-c treatment (Fig 5A).

      iii) Examination of the transcriptional regulation in mouse primary bone marrow-derived macrophages (BMDMs) by MOTS-c, seven days after a single treatment at the onset of differentiation (Fig 6).

      In addition to assessing adherence to plastic, we performed RNA-seq of THP-1 cells during early differentiation with MOTS-c as a measure of accelerated differentiation (Fig 4). The positive correlation between the effects of PMA and PMA+MOTS-c suggests that MOTS-c accelerates the transcriptional changes that occur during differentiation (Fig 4G). We consider this method a more comprehensive evaluation of differentiation as it encompasses the expression of thousands of genes rather than relying on a limited selection of cell surface markers. Future investigations should explore additional indicators of differentiation, including potential epigenetic effects of MOTS-c.

      Our findings indicate that endogenous MOTS-c is induced during monocyte stimulation and translocates into the nucleus (Figs 3-4), implying a cell-intrinsic role for MOTS-c during monocyte differentiation. Although examining MOTS-c deficient monocytes would offer valuable insights, technical limitations currently hinder the production of such monocytes due to the mitochondrial genomic encoding of MOTSc within the 12S rRNA.

      Furthermore, our study reveals that MOTS-c alters gene expression in macrophages similarly across age and sex groups. This observation, illustrated in Fig 6E where the fold changes in clusters 5 and 6 in response to MOTS-c were consistent across all groups, suggests that MOTS-c modulates macrophage gene expression in an age-related manner. We postulate this to be an adaptive response to age-related alterations in the monocyte and macrophage microenvironment.

      The number of biological replicates performed for each experiment is indicated.

      The different parts of the manuscript do not appear well connected and it is not clear what the main message from the manuscript would be. The physiological relevance of this study is also unclear.

      The main message of our manuscript is that the mitochondrial genome encodes for a previously unknown host defense peptide that has physiological roles in modulating immune responses during infection and during aging. We have edited the ‘introduction’ to clarify this.

      Reviewer #2 (Public Review):

      The research study presented by Rice et al. set out to further profile the host defense properties of the mitochondrial protein MOTS-c. To do this they studied i. the potential antimicrobial effects of MOTS-c on common bacterial pathogens E.coli and MRSA, ii. the effects of MOTS-c on the stimulation and differentiation of monocytes into macrophages. This is a well performed study that utilizes relevant methods and cell types to base their conclusions on. However, there appear to be a few weaknesses to the current study that hold it back from more broad application.

      Comment 1: From reading the manuscript methods and results, it is unclear exactly what the synthetic MOTS-c source is. Therefore it is hard to determine whether there may be any impurities in the production of this synthetic protein that may interfere with the results presented throughout the manuscript. Though, the data presented in Supplemental Figure 4F, where E.coli expressing intracellular MOTS-c inhibited bacterial growth certainly support MOTS-c specific effects. Similarly with the experiments showing endogenous MOTS-c levels rising in stimulation and differentiated macrophages (Figure 3).

      We have edited our manuscript to include the source and purity of our synthetic MOTS-c peptide. The MOTS-c peptide used was synthesized by New England Peptides (now Biosynth) with a purity >95% by mass spectrometry.

      Comment 2: It is interesting that the mice receiving bacteria coupled with MOTS-c lost about 10% of their body weight. It would have been interesting to demonstrate the cause of this weight loss since the effect appears to be separate from mere PAMPs as shown by using heat-killed MRSA in Supplemental Figure 5. Was inflammation changed? Is this due to changes in systemic metabolism? Would have been interesting to have seen CRP levels or circulating liver enzymes.

      As suggested, we repeated this experiment to include both the heat-killed and MOTS-c-MRSA groups in the same controlled experiment for comparison (Fig 2; see below). Blood was collected from these mice for evaluation of cytokine levels and markers of organ damage. While only 1/6 controls survived, all MOTSc and heat-killed MRSA-treated mice survived. However, compared to the heat-killed group, the MOTS-cMRSA group lost more weight and had a higher inflammatory profile, but still significantly less than in the control group. We hypothesize that this is due to only partial killing of MRSA by MOTS-c, as suggested by the CFU plated after overnight incubation, leading to a non-lethal infection in these mice. Others have shown that in this peritonitis model, α-hemolysin production by live MRSA is a key factor in toxicity, rather than PAMP-induced shock (PMID: 8975909; 22802349), which is consistent with the absence of death following heat-killed MRSA inoculation.

      Despite these concerns, the data are well suited to answering their research question, and they open up the door to studying how mitochondrial peptides like MOTS-c could have roles outside of the mitochondria.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improvement

      (1) The authors need to indicate in each legend the number of biological replicates analyzed and the number of independent experiments performed. This is essential.

      We have included the number of biological replicates analyzed.

      (2) The authors need to repeat the key experiments using freshly isolated monocytes, either human or mouse. THP-1 cells are abnormal cells and findings from these cells cannot be generalized to monocytes. For instance, in Figures 3A and B, it is clear that the kinetics of MOTS-c expression are different between THP-1 cells and human blood monocytes.

      The kinetics of THP-1 cells compared to human monocytes are slightly different, as expected by using different cells and different differentiation cues (M-CSF vs PMA). However, our findings collectively demonstrate the same effect, that each stimulus transiently induces the expression of MOTS-c within 24 hours in monocytes.

      In Figure 3A, the authors should show what happens in the absence of MCSF. Is MOTS-c expression upregulated by culture alone?

      There is some degree of baseline expression of MOTS-c in a resting state, and MOTS-c expression is significantly increased upon stimulation. This expression may be higher in primary monocytes than THP-1 cells, given that these monocytes are inevitably stressed by being removed from the native environment and put through the purification process.

      (3) In Figure 4A, a control for cytoplasmic contamination in the nuclear fraction is missing.

      We now include GAPDH detection in the nuclear fraction.  

      Author response image 1.

      (4) The RNA-seq analysis shown in Figure 4 is not very informative. What genes are differentially expressed? The authors should provide a list of these genes as supplementary information and highlight some key genes in the figure and text.

      The complete list of these genes is provided in Tables S1 and S2. We chose not to highlight specific genes in this paper due to the lack of sufficient evidence identifying any particular genes as key factors at this time.

      (5) In Figure 5A, a control is missing: the authors should treat the monocytes with the same volume of 'vehicle' (presumably it is water).

      In all experiments with MOTS-c treatment, the controls were treated with the same volume of vehicle (water). We have edited legends to state this.

      (6) In Figure 6, the differences observed seem independent on MOTS-c. The conclusions from this figure are overstated and need to be rephrased and clarified.

      MOTS-c shifted gene expression in macrophages in a similar manner regardless of age and sex, as shown in Fig 6E where the fold changes in clusters 5 and 6 in response to MOTS-c were similar in all groups. Independently, aging alone increases the expression of these same genes related to antigen presentation and interferon signaling, suggesting that MOTS-c shifts macrophage gene expression in an age-related manner – the expression of antigen presentation and interferon-related genes have been shown to be highly age-related (PMID: 36040389, 32669714, 36622281, 31754020). We hypothesize this to be an adaptive response to age-related changes in the monocyte and macrophage microenvironment.

      (7) Adherence to plastic is not a robust read-out for monocyte differentiation into macrophages. The authors need to examine other parameters, for instance characteristic cell surface markers for macrophages.

      As a read-out of accelerated differentiation, in addition to adherence to plastic we performed RNA-seq of THP-1 cells during early differentiation with MOTS-c (Fig 4). The positive correlation between the effects of PMA and effects of PMA+MOTS-c suggest MOTS-c is accelerating the transcriptional changes that occur during differentiation (Fig 4G). We believe this to be a more robust assessment of differentiation as it relies on the expression of thousands of genes rather than a limited selection of cell surface markers. Further studies are needed to assess other read-outs of differentiation, including possible epigenetic effects of MOTS-c.

      (8) It is not clear whether MOTS-c could have a cell-intrinsic effect in monocytes. The results should be strengthened by examining the differentiation of monocytes deficient for MOTS-c (without addition of exogenous MOTS-c).

      We have shown that endogenous MOTS-c is induced during monocyte stimulation and translocates into the nucleus (Figs 3-4), suggesting that MOTS-c does have a cell-intrinsic role during monocyte differentiation.

      While having MOTS-c deficient monocytes would certainly be insightful, because MOTS-c is encoded within the mitochondrial genome in the 12S rRNA there are currently technical limitations in producing these monocytes.

      Other points

      (1) The paper would benefit from a more extended discussion to understand the physiological relevance of these findings. What cells would release MOTS-c in vivo, and how would that affect monocytes ? Is there a cell-intrinsic of MOTS-c in monocytes, and if so what would be the signals inducing its expression during differentiation ? These aspects should be discussed by the authors so that the readers can understand their views.

      We thank the reviewer for their suggestion and have edited the discussion in our revised manuscript.  

      MOTS-c has been detected in various tissue and cell types, including the liver, muscle, T cells, monocytes/macrophages, and epithelial cells. This aligns with MOTS-c being referred to in literature as a cytokine, which are typically expressed by a broad range of cell types. Consistent with this, we also propose that MOTS-c would be expressed in cells known to express HDPs.

      We hypothesize that MOTS-c acts in both a cell-intrinsic and extrinsic manner in vivo, consistent with known HDPs, to both target bacteria directly and modulate immune cell responses. In vitro, M-CSF, PMA, LPS, and IFNγ each induced MOTS-c expression. In vivo, monocytes respond to a range of stimuli that influence their differentiation, and these stimuli may induce MOTS-c as well. We have previously published that MOTS-c acts primarily under conditions of cell stress, such as nutrient deprivation and oxidative stress, to help restore homeostasis. While MOTS-c did regulate macrophage gene expression in resting “M0-like” macrophages, we hypothesize that the physiological role of MOTS-c is to regulate cell adaptation to stress, therefore the context under which monocytes differentiate will be an important factor determining the functional effects of MOTS-c. In future studies, we plan to test whether the immuno-modulatory effects of MOTS-c are dependent on the environment during differentiation.

      (2) Scale bar appear to be missing from Figure 1G.

      We apologize for the poor resolution of the scale bar. We have made it easily recognizable in the revised figure.  

      (3) It is not very clear what is shown in Figure S2. The authors should better explain what the images represent.

      Figure S2 is related to Figure 1D and Figure S1. In this experiment, E. coli, S. typhimurium, and P. aeruginosa cultures were treated with MOTS-c (100uM). We observed that only E. coli aggregated immediately, while

      S. typhimurium and P. aeruginosa did not show aggregation. This suggests that MOTS-c exhibits specificity in targeting certain types of bacteria, although the underlying basis of this specificity is currently unknown.  

      We have revised the legend as follows: 'MOTS-c exhibits specificity in bacterial targeting. MOTS-c (100 μM) treatment causes immediate aggregation of E. coli but not S. typhimurium or P. aeruginosa (n=6). Representative image shown. See Figure 1D'.

      Reviewer #2 (Recommendations For The Authors):

      This is a beautifully executed study and a well written manuscript. I generally don't have much critical feedback to give based on my reading. The only recommendation I have to improve the completeness of the data would be in relation to Figure 5E and F. The metabolic phenotype of LPS stimulated monocytes/macrophages is more typically the Warburg effect where oxidative phosphorylation is reduced (as you show with a lowered OCR), but with a concomitant elevation in lactate production. It would have been nice to see either i. the ECAR levels from your seahorse data, or ii. separate lactate measurements on your supernatants. This would go a long way to further explaining the phenotype described in the figure.

      We greatly appreciate the reviewer's positive feedback. The data provided below are ECAR measurements obtained from the Seahorse assay. However, it's important to note that the assays were originally designed for OCR measurement (e.g. buffered media unsuitable for ECAR measurements, use of mitochondrial complex inhibitors, etc.), thus rendering the ECAR data unreliable for accurately assessing glycolysis. Consequently, while we share this data with the reviewer, we believe it is inappropriate to include it in the manuscript (hence omitted in the original submission).

      Author response image 2.

      Furthermore, we are currently engaged in a separate manuscript focusing on elucidating the immunometabolic mechanisms of MOTS-c in macrophages. We intend for this manuscript to stand alone, providing a comprehensive exploration of metabolic pathways, including a detailed untargeted metabolomics map spanning multiple time-points.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the data are interesting and the authors have tried to exclude multiple confounding factors, many patterns cannot clearly be ascribed to one cause or another.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each. Many analyses and simulations to check analyses have been carried out.

      Weaknesses:

      The strength of conclusions that can be drawn from the analyses was low, partly because there are many strange patterns. The authors have done a good job of adding caveats, but clearly, these species do not meet many assumptions of our methods.

      Thank you for the comments. We appreciate the multiple rounds of revision the manuscript has undergone and the work has improved as a consequence. Overall we disagree that the patterns are strange, and have made considerable efforts to explain in the text and in our responses why the patterns make sense based on what we know about the history of Zeamays from previous research. We agree that currently available methods are not capable of answering all questions we propose adequately. This reflects both limitations with the available data for these populations (i.e. phenotypes and spatially explicit sampling), and limitations in available methods tailored to the questions at hand (spatially explicit inference of the range over which an allele is adaptive). We have made considerable effort to point out the places where our inferences are likely to have low accuracy or limited resolution. These limitations are in many ways inherent to all inferential based science and should not be considered a weak point specific to this work, nor do they take away from the fundamental conclusions, which have changed quantitatively but not qualitatively over the course of peer review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      -The manuscript should say something about the fact that range-wide PSMC does not show a decline.

      We did not use PSMC methods but instead mushi as described in the methods. On line 356 we described how the lower sample size and strong regularization are the most likely explanations for the lack of a population size decline in the rangewide samples.

      - The manuscript should explain how rdmc was run and what "overlapping" means.

      We described how sweep intervals were inferred starting on line 823 (Methods subsection “Identifying Selective Sweeps”). Sweep regions were defined as the outermost coordinates from all populations that shared any overlap in their respectively defined sweep intervals. The details of how we ran rdmc, including all of the parameters, is described starting on line 895 (methods subsection “Inferring modes of convergent adaptation”).

      - Figure 4: "Negative log10" is messed up

      Thank you. This has been fixed for the Version Of Record.

      - Line 318: "accruacy"

      Thank you. We have edited this typo for the Version Of Record.

      - New Table S3: why don't the proportions add to 1?

      These values represent what proportion of fixed differences at 0 fold sites are unique to each population. The denominator is the total number of fixed differences for each population separately, so each proportion is distinct for each population and thus should not sum to one across them. The table caption has been reworded in efforts to clarify for the Version Of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the patterns are interesting, the strength of evidence in support of the conclusions drawn from these patterns is weak overall. Most of the main conclusions are not supported by convincing analyses.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each.

      Weaknesses:

      There were issues with many parts of the paper, especially with the strength of conclusions that can be drawn from the analyses. I list the major issues in the order in which they appear in the paper.

      (1) Gene flow and demography.

      The f4 tests of introgression (Figure 1E) are not independent of one another. So how should we interpret these: as gene flow everywhere, or just one event in an ancestral population? More importantly, almost all the significant points involve one population (Crucero Lagunitas), which suggests that the results do not simply represent gene flow between the sub-species. There was also no signal of increased migration between sympatric pairs of populations. Overall, the evidence for gene flow presented here is not convincing. Can some kind of supporting evidence be presented?

      We agree that the standard approach to f4 tests that we employed here is not without limitations, namely, that the tests are conducted independently, while the true evolutionary history is not. While a joint demographic inference across all populations would be useful, it did not seem tractable to perform over all of our populations with currently available methods, given the number of populations being analyzed, nor does it directly address the question of interest. Our purpose for including the f4 was testing if there was more gene flow between sympatric pairs than in other comparisons (we have made that point more clear in the text near line 174. As described in the text, the distribution of Z scores is generated by pairing focal populations with all other non-focal populations across both subspecies, which means the gene flow signal of interest is marginalized over the effects of gene flow in the other non-focal populations. This is not nearly as rich as inferring the full history, but it gives us some sense of the average amount of gene flow experienced between populations and allows us to address one of our primary questions of interest when conceiving this paper - do sympatric pairs show more geneflow than other pairs? We agree with the reviewer that that answer is largely no, and the writing reflects this.

      Overall, we think both points mentioned by the reviewer here; finding that most but not all tests involved Crucero Lagunitas maize, and that sympatric pairs don’t show higher gene flow; nicely contributes to the overall theme in the paper - the history of both subspecies is idiosyncratic and impacted by humans in ways that do not reflect geographic proximity that we did not anticipate (see expectations near line 110). We have emphasized the connection between f4 tests and the revised rdmc results near line 653.

      The paper also estimates demographic histories (changes in effective population sizes) for each population, and each sub-species together. The text (lines 191-194) says that "all histories estimated a bottleneck that started approximately 10 thousand generations ago" but I do not see this. Figure 2C (not 2E, as cited in the text) shows that teosinte had declines in all populations 10,000 generations ago, but some of these declines were very minimal. Maize has a similar pattern that started more recently, but the overall species history shows no change in effective size at all. There's not a lot of signal in these figures overall.

      I am also curious: how does the demographic model inferred by mushi address inbreeding and homozygosity by descent (lines 197-202)? In other words, why does a change in Ne necessarily affect inbreeding, especially when all effective population sizes are above 10,000?

      All maize populations show a decline beginning 10,000 generations ago. The smallest decline for maize is from 100,000 to 30,000. All teosinte populations show a reduction in population size. The smallest of these drops more than 70% from around 300,000 to 100,000. Three of the teosinte populations showed a reduction in population size from ~10^5 to ~10^3, which is well below 10,000. Thus all populations show declines.

      These large reductions should lead to inbreeding and increased homozygosity by descent. Mushi does not specifically model these features of the data, yet as we show, simulations under the model estimated by Mushi matched the true HBD levels fairly well (Figure 2D).

      The rangewide sample does not show declines, likely because there is enough isolation between populations that the reduction in variation at any given locus is not shared, and is maintained in the populations that did not experience the population decline.

      (2) Proportion of adaptive mutations.

      The paper estimates alpha, the proportion of nonsynonymous substitutions fixed by positive selection, using two different sampling schemes for polymorphism. One uses range-wide polymorphism data and one uses each of the single populations. Because the estimates using these two approaches are similar, the authors conclude that there is little local adaptation. However, this conclusion is not justified.

      There is little information as to how the McDonald-Kreitman test is carried out, but it appears that polymorphism within either teosinte or maize (using either sampling scheme) is compared to fixed differences with an outgroup. These species might be Z. luxurians or Z. diploperennis, as both are mentioned as outgroups. Regardless of which is used, this sampling means that almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte, and on the branch leading to the outgroup. Therefore, it should not be surprising that alpha does not change based on the sampling scheme, as this should barely change the number of fixed differences (no numbers are reported).

      The lack of differences in results has little to do with range-wide vs restricted adaptation, and much more to do with how MK tests are constructed. Should we expect an excess of fixed amino acid differences on very short internal branches of each sub-species tree? It makes sense that there is more variation in alpha in teosinte than maize, as these branches are longer, but they all seem quite short (it is hard to know precisely, as no Fst values or similar are reported).

      The section “Genetic Diversity” in the methods provides details about how luxurians and diploperennis were used as outgroups. The section “Estimating the Rate of Positive Selection, α”, in the methods includes the definition of α and full joint non-linear regression equation and the software used to estimate it (brms), and the relevant citations crediting the authors of the original method. However, some of the relevant information about the SFS construction is provided in the previous section entitled, “Genetic Diversity”. We added reference to this in results near line 800.

      While we appreciate the concern that “almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte”, this is only a problem if there aren’t enough fixed differences that are unshared between populations. This is more of a concern for maize than teosinte, which we make clear as a caveat in the manuscript in several places already. The fact that there is variation in alpha among teosinte populations is evidence that these counts do differ among pops. As we can see in the population trees in Figure 1, there is a considerable amount of terminal branch length for all the populations. Indeed if we look at the number of fixed differences at 0 fold sites across populations:

      The variation in the number of fixed differences, particularly across teosinte means that a large number cannot be shared between populations. We can estimate the fixed differences unique to each subpopulation (and total count) demonstrating that, in general, there are a large number of substitutions unique to each population. This is good evidence the rangewide estimates do not reflect a lack of variation within populations, at least not for teosinte. This is now included in the supplement (Table S3).

      Finally, we note that the branches leading to outgroups are likely not substantially longer than those among populations. Given our estimates of Ne, the coalescent within maize and teosinte should be relatively deep (with Ne of 30K it should be ~120K years). The divergence time between Zea mays and these outgroup taxa has been estimated at ~150K years (Chen et al. 2022). This is now mentioned in the text on line 407.

      We have added a caveat about the reviewers concern for the non-independence of fixed difference for maize near line 386.

      (3) Shared and private sweeps.

      In order to make biological inferences from the number of shared and private sweeps, there are a number of issues that must be addressed.

      One issue is false negatives and false positives. If sweeps occur but are missed, then they will appear to be less shared than they really are. Table S3 reports very high false negative rates across much of the parameter space considered, but is not mentioned in the main text. How can we make strong conclusions about the scale of local adaptation given this? Conversely, while there is information about the false positive rate provided, this information doesn't tell us whether it's higher for population-specific events. It certainly seems likely that it would be. In either case, we should be cautious saying that some sweeps are "locally restricted" if they can be missed more than 85% of the time in a second population or falsely identified more than 25% of the time in a single population.

      The reviewer brings up a worthwhile point. The simulation results indeed call into question how many of the sweeps we claim are exclusive to one population actually are. This caveat is already made, but we now make clearer the reviewer’s concern regarding the high false negative rate (near line 299). However, if anything this suggests sweeps are shared even more often than what is reported. One of the major takeaways from the paper is that convergent adaptation is more common than we expected. The most interesting part about the unique sweeps is the comparison between maize and teosinte. While the true proportions may vary, the relatively higher proportion of sweeps exclusive to one population in teosinte compared to maize is unlikely to be affected by false negatives, since the accuracy to identify sweeps pretty similar across subspecies (though perhaps with some exceptions for the populations with stronger bottlenecks). Further, these criticisms are specific to the raisd results. All sweeps shared across multiple populations were analyzed using rdmc. After adjustments made to the number of proposed sites for selection (see response below), there is good agreement between the raisd and rdmc results - the regions we proposed as selective sweeps with raisd all show evidence convergence using rdmc. Recall too that rdmc uses a quite different approach to inference - all populations are used jointly, labelling those that did and did not experience the sweep. If sweeps were present in populations that were labeled as neutral (or vice versa), this would weaken the power to infer selection at the locus. Much of the parameter space we explored is for quite weak selection, and the simulated analysis shows we are likely to miss those instances, often entirely. For strong sweeps, however, our simulations show we have appreciable accuracy.

      Together, there is reason to be optimistic about our detection of strong shared sweeps and that the main conclusions we make are sound.

      Finally, we note that we are unaware of any other empirical study that has performed similar estimates of the accuracy of the sweep calling in their data (as opposed to using simulations). We thus see these analyses as a significant contribution towards transparency that is completely lacking from most papers.

      A second, opposite, issue is shared ancestral events. Maize populations are much more closely related than teosinte (Figure 2B). Because of this, a single, completed sweep in the ancestor of all populations could much more readily show a signal in multiple descendant populations. This is consistent with the data showing more shared events (and possibly more events overall). There also appear to be some very closely (phylogenetically) related teosinte populations. What if there's selection in their shared ancestor? For instance, Los Guajes and Palmar Chico are the two most closely related populations of teosinte and have the fewest unique sweeps (Figure 4B). How do these kinds of ancestrally shared selective events fit into the framework here?

      The reviewer brings up another interesting point and one that likely impacts some of our results.

      As the reviewer describes, this is an issue that is of more concern to the more closely related populations and is less likely to explain results across the subspecies. We have added this as a caveat (near line 456). As is clear in the writing, sharing across subspecies is our primary interest for the rdmc results.

      These analyses of shared sweeps are followed by an analysis of sweeps shared by sympatric pairs of teosinte and maize. Because there are not more events shared by these pairs than expected, the paper concludes that geography and local environment are not important. But wouldn't it be better to test for shared sweeps according to the geographic proximity of populations of the same sub-species? A comparison of the two sub-species does not directly address the scale of adaptation of one organism to its environment, and therefore it is hard to know what to conclude from this analysis.

      We did not intend to conclude that local adaptation is not important. Especially for teosinte, we report and interpret evidence that many sweeps are happening exclusively to one population, which is consistent with the action of location adaptation and consistent with some of our expectations.

      More directly, this is another instance of us having clear hypotheses going into the paper and constructing specific analyses to test them. As we explain in the paper, we expected the scale of local adaptation to be very small, such that subspecies growing next to each other have more opportunities to exchange alleles that are locally adapted to their shared environment. The analysis we conducted makes sense in light of this expectation. We considered conducting tests regarding geographic proximity, but there is limited power with the number of populations we have within subspecies, and the meaning of the tests is unclear if all populations of both subspecies are naively included together. This analysis shows that, at least for sweeps and fixations, adaptation is larger than a single location. While it may not be a complete description on its own, the work here does provide information about the scale of adaptation and is useful to our overall claims and objectives of the paper. As mentioned in the paper, the story might be very different if we were to study through a lens of polygenic adaptation. We also now include in the discussion in several places mention of where broader sampling could improve inference.

      (4) Convergent adaptation

      My biggest concern involves the apparent main conclusion of the paper about the sources of "convergent adaptations". I believe the authors are misapplying the method of Lee and Coop (2017), and have not seriously considered the confounding factors of this method as applied. I am unconvinced by the conclusions that are made from these analyses.

      The method of Lee and Coop (referred to as rdmc) is intended to be applied to a single locus (or very tightly linked loci) that shows adaptation to the same environmental factor in different populations. From their paper: "Geographically separated populations can convergently adapt to the same selection pressure. Convergent evolution at the level of a gene may arise via three distinct modes." However, in the current paper, we are not considering such a restricted case. Instead, genome-wide scans for sweep regions have been made, without regard to similar selection pressures or to whether events are occurring in the same gene. Instead, the method is applied to large genomic regions not associated with known phenotypes or selective pressures.

      I think the larger worry here is whether we are truly considering the "same gene" in these analyses. The methods applied here attempt to find shared sweep regions, not shared genes (or mutations). Even then, there are no details that I could find as to what constitutes a shared sweep. The only relevant text (lines 802-803) describes how a single region is called: "We merged outlier regions within 50,000 Kb of one another and treated as a single sweep region." (It probably doesn't mean "50,000 kb", which would be 50 million bases.) However, no information is given about how to identify overlap between populations or sub-species, nor how likely it is that the shared target of selection would be included in anything identified as a shared sweep. Is there a way to gauge whether we are truly identifying the same target of selection in two populations?

      The question then is, what does rdmc conclude if we are simply looking at a region that happened to be a sweep in two populations, but was not due to shared selection or similar genes? There is little testing of this application here, especially its accuracy. Testing in Lee and Coop (2017) is all carried out assuming the location of the selected site is known, and even then there is quite a lot of difficulty distinguishing among several of the non-neutral models. This was especially true when standing variation was only polymorphic for a short time, as is estimated here for many cases, and would be confused for migration (see Lee and Coop 2017). Furthermore, the model of Lee and Coop (2017) does not seem to consider a completed ancestral sweep that has signals that persist into current populations (see point 3 above). How would rdmc interpret such a scenario?

      Overall, there simply doesn't seem to be enough testing of this method, nor are many caveats raised in relation to the strange distributions of standing variation times (bimodal) or migration rates (opposite between maize and teosinte). It is not clear what inferences can be made with confidence, and certainly the Discussion (and Abstract) makes conclusions about the spread of beneficial alleles via introgression that seem to outstrip the results.

      We have fixed the “50,000 Kb” typo.

      There are several important points the reviewer makes here worth considering. First and most importantly, the method of Lee and Coop (2017) actually does include sites as part of the composite likelihood calculation. For computational feasibility, the number of positions we initially considered was 20 (20 different positions along the input sequence were proposed as the site of the shared beneficial mutation). In efforts to further address the reviewer’s concern about adaptive mutations at distinct loci, we have increased the number of proposed selected sites to 200. This fact should greatly diminish the reviewer’s concern that we are picking up independent sweeps that happened at different nucleotide positions in the same region - evidence for a beneficial mutation must be shared by the selected populations at a proposed site. As the revisions show, this has modified the results of our paper in a number of ways, including changing all of the previous neutral regions to shared via standing variation or migration. Despite these changes, our previous conclusions are intact, including the pattern that migration rates are high when maize populations share the sweep. Relatedly, we disagree with the reviewer’s characterization of the migration results. The pattern is quite clear and makes sense - when a maize population is involved in the sweep, migration rate is inferred to be high. Sweeps exclusive to teosinte are rarer and are inferred to have a low migration rate. This relates directly to the idea that humans have moved maize relatively rapidly across the landscape.

      We have now included a plot showing how the difference between the maximum composite likelihood (CLE) site compares to the next highest CLE site varies across our inferences (Figure S8), which strongly suggests that patterns are not muddled across multiple loci, but are centered at a focal region where the beneficial allele is inferred to be located. While there are too many to show in the manuscript across all sweeps, here is a nice example of what inference looks like for one of the proposed sweep regions.

      Author response image 1.

      Furthermore, the situation the reviewer is describing would be selection acting on independent mutations (mutations at different loci), which would not create an increase in the amount of allele frequency covariance above and beyond what would be expected by drift under the migration and standing variation models.

      We also note that we are not alone in applying this approach to shared outlier signals in the absence of known genes; indeed the authors of the DMC method have applied it to regions of shared outlier signal themselves (e.g. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008593).

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled multiple populations of maize and teosinte across Mexico, aiming to characterise the geographic scale of local adaptation, patterns of selective sweeps, and modes of convergent evolution between populations and subspecies.

      Strengths & Weaknesses:

      The population genomic methods are standard and appropriate, including Fst, Tajima's D, α, and selective sweep scans. The whole genome sequencing data seems high quality. However, limitations exist regarding limited sampling, potential high false-positive sweep detection rates, and weak evidence for some conclusions, like the role of migration in teosinte adaptation.

      Aims & Conclusions:

      The results are interesting in supporting local adaptation at intermediate geographic scales, widespread convergence between populations, and standing variation/gene flow facilitating adaptation. However, more rigorous assessments of method performance would strengthen confidence. Connecting genetic patterns to phenotypic differences would also help validate associations with local adaptation.

      Impact & Utility:

      This work provides some of the first genomic insights into local adaptation and convergence in maize and teosinte. However, the limited sampling and need for better method validation currently temper the utility and impact. Broader sampling and connecting results to phenotypes would make this a more impactful study and valuable resource. The population genomic data itself provides a helpful resource for the community.

      Additional Context:

      Previous work has found population structure and phenotypic differences consistent with local adaptation in maize and teosinte. However, genomic insights have been lacking. This paper takes initial steps to characterise genomic patterns but is limited by sampling and validation. Additional work building on this foundation could contribute to understanding local adaptation in these agriculturally vital species.

      We appreciate the reviewer’s thoughtful reading of the paper and scrutiny. We hope that the added caveats made in response to reviewer 1 (as well as the previous rounds of peer review) will provide readers with the proper amount of skepticism in the accuracy of some of our initial sweep results, while also demonstrating that many of our conclusions are robust to the concerns raised over the various stages of review.

      We agree with the reviewer that better sampling and the incorporation inference about phenotypic data would be excellent additions, but the information is not available for the studied populations, and is outside scope of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Sometimes alpha is described as a rate, and sometimes as a proportion. The latter is correct.

      We have updated this. Thanks.

      - Line 79: are they really "discrete" populations?

      The teosinte populations sampled are all clearly separated from each other and are physically discrete. The maize population samples came from individual farmer fields. Traditional maize is grown as open-pollinated (outcrossing) populations, and farmers save seed for subsequent generations. An individual farmer’s field thus behaves as a discrete population for our purposes, impacted of course by gene flow, selection, and other evolutionary processes.

      - Lines 418-420: "Large genomes may lead to more soft sweeps, where no single mutation driving adaptive evolution would fix (Mei et al. 2018)." I'm not sure I understand this statement. Why is this a property of genome size?

      Mei et al. 2018 lay out the logic, but essentially they present data arguing that the total number of functionally relevant base pairs increases with genome size (less than linearly). If true, genomes with a large number of potentially functional bp are more likely to undergo soft sweeps (see theory by Hermisson and Pennings cited in Mei et al. 2018).

      - Lines 500-1: selection does not cause one to underestimate effective population sizes. Selection directly affects Ne. I'm not sure what biases the sentences on lines 502-508 are trying to explain.

      We have simplified this section. Not accounting for linked selection (especially positive selection) results in a biased inference of demographic history. See Marsh and Johri (2024) for another example. https://doi.org/10.1093/molbev/msae118

      - Line 511-3: does Uricchio et al. (2019) show any difference in the estimate of alpha from Messer and Petrov (2013) when taking background selection into account?

      What we initially wrote was incorrect. The aMK method of Messer and Petrov (2013) accounts for weakly deleterious polymorphisms, but it does not account for positively selected ones. We have updated this text and suggested our method may underestimate alpha if positively selected segregating alleles are common (near line 539).

      - Lines 598-599: "which would limit the rate of new and beneficial mutations." I don't understand this - shouldn't a bottleneck only affect standing variation? Why would a bottleneck affect new mutations?

      This is simply to say that during the low Ne period of a bottleneck, fewer total mutations (and therefore beneficial mutations) will be generated since there are fewer individuals for mutations to occur in. We have changed “rate” to amount to clarify we do not mean the mutation rate itself.

      Reviewer #2 (Recommendations For The Authors):

      Experiments/Analyses:

      (1) Consider simulating polygenic adaptation in addition to hard and soft sweeps to see if this improves the power to detect adaptive signatures shared between populations. This could involve simulating the coordinated change in allele frequencies across many loci to match a specified shift in trait value due to selection. The ability to detect shared polygenic adaptation between population replicates could be assessed using methods tailored to polygenic signals, such as the Polygenic Selection Score approach. Comparing the power to detect shared polygenic adaptation versus shared hard and soft sweeps would provide further insight into what adaptive modes current methods can uncover. If the power to detect shared polygenic adaptation is very low, the extent of shared adaptation between populations may be even more common than currently inferred. Adding simulations of polygenic adaptation would strengthen the study.

      While this would be a worthwhile undertaking in general, it would be a considerable amount of work outside of the scope and aims of this paper.

      (2) Explore using machine learning approaches like S/HIC to improve power over summary statistic methods potentially.

      We in fact put considerable effort into applying diplo S/HIC before switching to raisd for this project. While predictions on simulations had good power to detect sweeps, we found that applying to our actual data had a dubious number of windows classified as sweeps (e.g. >90% of the genome), which we believed to be false positives. We speculated that this may have to do with sensitivity to demographic or other types of misspecification in the simulations, such as our choice of window sizes compared to local recombination rates. It would likely be fruitful to our further efforts into using machine learning methods for maize and teosinte, but a deeper exploration of the right hyper parameters and simulation choices is likely needed to apply them effectively.

      (3) Increase geographic sampling density, if possible, especially near population pairs showing high differentiation, to better understand the scale of local adaptation.

      We agree this would be valuable research. Hopefully this work inspires further efforts into the question of the spatial and temporal scales of local adaptation with more ambitious spatial sampling designed at the onset

      Writing/Presentation:

      (1) Provide more intuition about the biological interpretation of the migration rates inferred under the migration model of convergence. What do the rates imply about the amount or timing of gene flow?

      We have expanded the discussion sections (starting near line 653) to elaborate on the migration results and connect the rdmc and f4 tests more explicitly. The timing of gene flow is more challenging to address directly with the approaches we used, but we agree it would be interesting to explore more in future papers.

      (2a) Expand the discussion of power limitations and the need for simulation tests. Consider adding ROC curves for sweep detection on simulated data. The relatively low proportion of shared selective sweeps between population replicates highlights limitations in the power to detect sweeps, especially incomplete or soft sweeps. I think it would be a good idea to expand the discussion of the power tradeoffs shown in the simulation analyses. In particular, the ROC curves in Figure S4 clearly show how power declines for weaker selection coefficients across the different sweep types. I suggest making these ROC curves part of the main figures to feature the issue of power limitations more prominently.

      (2b) The discussion would benefit from commenting on how power changes across the sweep simulation scenarios. Adding a summary figure to visualise the effects of sweep type, selection strength, and frequency on detectability could further clarify the power constraints. Stating the proportion of sweeps likely missed strengthens the argument that sharing adaptive alleles is likely even more common than inferred. Discussing power will also motivate the need for developing methods with improved abilities to uncover incomplete and soft sweeps.

      While these are useful suggestions (2a and 2b), the aim of this paper at its core is empirical, and was not intended to give an exhaustive analysis of the power to detect sweeps. We report what parts of the analysis may be impacted by low power and what aspects of our inferences have higher uncertainty due to power. We agree that there is more work to be done to improve methods to detect selection given our findings (see below concerning our efforts to use machine learning as well). While we do not highlight this in the paper, we also note that ours is one of extremely few empirical studies that actually perform power analyses on real data (as opposed to simulations). We think this extra transparency by itself is of substantial utility to the community in demonstrating that the results from simulation studies performed in publications describing a method do not necessarily translate well to empirical data.

      (3) Improve clarity in describing f4 test results. Consider visualising results on a map to show spatial patterns.

      We have expanded the discussion concerning f4 tests (see several comments to reviewer 1). We are not clear on how to effectively visualize f4 spatially, but hope the updates have made the results more clear.

      Minor:

      -  Increase the font size of figure axis labels for improved readability.

      We have looked over and figures and increased font sizes where possible.

      -  Add units to selection coefficient axis labels in Figure 5.

      Selection coefficients are derived in Lee and Coop (2017) from classical population genetics theory. They do not have units, but denote the relative fitness advantage of the heterozygous genotype carrying the beneficial mutation of interest.

      -  Fix the typo 'cophenetic' in Figure S3 caption.

      Fixed. Thank you.

    2. eLife Assessment

      This useful study examines patterns of diversity and divergence in two closely related sub-species of Zea mays, patterns that have bearings on local adaptation in maize and teosinte at intermediate geographic scales. The authors suggest that convergent evolution has been facilitated by both standing variation and gene flow, with independent selective sweeps in the two species. While the data themselves are solid, there are limitations concerning population sampling, false positive rates in sweep detection and integration of phenotypic data, which make it difficult to draw definitive conclusions. The work should in principle be of broad interest to colleagues studying the relationship between domesticated species and their progenitors, as well as those studying instances of parallel evolution.

    3. Reviewer #1 (Public review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the data are interesting and the authors have tried to exclude multiple confounding factors, many patterns cannot clearly be ascribed to one cause or another.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each. Many analyses and simulations to check analyses have been carried out.

      Weaknesses:

      The strength of conclusions that can be drawn from the analyses was low, partly because there are many strange patterns. The authors have done a good job of adding caveats, but clearly, these species do not meet many assumptions of our methods

    1. eLife Assessment

      This work presents important findings that the human frontal cortex is involved in a flexible, dual role in both maintaining information in short-term memory, and controlling this memory content to guide adaptive behavior and decisions. The evidence supporting the conclusions is compelling, with a well-designed task, best-practice decoding methods, and careful control analyses. The work will be of broad interest to cognitive neuroscience researchers working on working memory and cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shao et al. investigate the contribution of different cortical areas to working memory maintenance and control processes, an important topic involving different ideas about how the human brain represents and uses information when no longer available to sensory systems. In two fMRI experiments, they demonstrate that human frontal cortex (area sPCS) represents stimulus (orientation) information both during typical maintenance, but even more so when a categorical response demand is present. That is, when participants have to apply an added level of decision control to the WM stimulus, sPCS areas encode stimulus information more than conditions without this added demand. These effects are then expanded upon using multi-area neural network models, recapitulating the empirical gradient of memory vs control effects from visual to parietal and frontal cortices. Multiple experiments and analysis frameworks provide support for the authors' conclusions, and control experiments and analysis are provided to help interpret and isolate the frontal cortex effect of interest. While some alternative explanations/theories may explain the roles of frontal cortex in this study and experiments, important additional analyses have been added that help ensure a strong level of support for these results and interpretations.

      Strengths:

      - The authors use an interesting and clever task design across two fMRI experiments that is able to parse out contributions of WM maintenance alone along with categorical, rule-based decisions. Importantly, the second experiment only uses one fixed rule, providing both an internal replication of Experiment 1's effects and extending them to a different situation when rule switching effects are not involved across mini-blocks.

      - The reported analyses using both inverted encoding models (IEM) and decoders (SVM) demonstrate the stimulus reconstruction effects across different methods, which may be sensitive to different aspects of the relationship between patterns of brain activity and the experimental stimuli.

      - Linking the multivariate activity patterns to memory behavior is critical in thinking about the potential differential roles of cortical areas in sub-serving successful working memory. Figure 3's nicely shows a similar interaction to that of Figure 2 in the role of sPCS in the categorization vs. maintenance tasks. This is an important contribution to the field when we consider how a distributed set of interacting cortical areas supports successful working memory behavior.

      - The cross-decoding analysis in Figure 4 is a clever and interesting way to parse out how stimulus and rule/category information may be intertwined, which would have been one of the foremost potential questions or analyses requested by careful readers.

      - Additional ROI analyses in more anterior regions of the PFC help to contextualize the main effects of interest in the sPCS (and no effect in the inferior frontal areas, which are also retinotopic, adds specificity). And, more explanation for how motor areas or preparation are likely not involved strengthens the takeaways of the study (M1 control analysis).

      Weaknesses:

      - An explicit, quantitative link between the RNN and fMRI data is perhaps a last point that would integrate the RNN conclusion and analyses in line with the human imaging data.

      - As Rev 2 mentions, multiple types of information codes may be present, and the response letter Figure 5 using representational similarity (RSA) gets at this question. It would strengthen the work to, at minimum, include this analysis as an extended or supplemental figure.

      To sum up the results, a possible, brief schematic of each cortical area analyzed and its contribution to information coding in WM and successful subsequent behavior may help readers take away important conclusions of the cortical circuitry involved.

    3. Reviewer #2 (Public review):

      Summary:

      The author provide evidence that helps resolve long-standing questions about the differential involvement of frontal and posterior cortex in working memory. They show that whereas early visual cortex shows stronger decoding of memory content in a memorization task vs a more complex categorization task, frontal cortex shows stronger decoding during categorization tasks than memorization tasks. They find that task-optimized RNNs trained to reproduce the memorized orientations show some similarities in neural decoding to people. Together, this paper presents interesting evidence for differential responsibilities of brain areas in working memory.

      Strengths:

      This paper was overall strong. It had a well-designed task, best-practice decoding methods, and careful control analyses. The neural network modeling adds additional insight into the potential computational roles of different regions.

      Weaknesses:

      Few. While more could be perhaps done to understand the RNN-fMRI correspondence, the paper contributes a compelling set of empirical findings and interpretations that can inform future research.