26,869 Matching Annotations
  1. Jun 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I have one major concern regarding this draft of the manuscript:

      (1) In the manuscript (lines 130-31) it is stated that "About 55% (8/15) of mice with unilateral AAV-hM3Dq centered in the PMv showed an increase in LH release above 0.5ng/ml within 10-20 min following the CNO injection" However, data at time zero are not shown for 4 of the 8 "LH peak" animals. The missing data at time zero seems problematic for the analysis of the CNO-stimulated cohort. As mentioned in the manuscript, the area under the curve was calculated between the range of -10 to 20min post-injection. Because diestrus animals have spontaneous LH pulses, it is highly possible that an LH pulse is initiated in the10 minutes prior to drug delivery, as seen in the AAV-mCherry group in 1D, and similarly in 2C. Given the current form of analysis, it seems possible that a spontaneous LH pulse initiated anywhere up to 10 minutes prior to drug delivery could conceivably count as an experimentally induced "LH peak". Can you address this concern?

      We understand the reviewer’s concern about the spontaneous LH pulses. This is the reason we have been very strict on our analysis and have taken multiple approaches to analyze these data. In our hM3Dq group 55% of the animals responded to CNO with an increase in LH, while 0 responded in the negative control group. But also, in the clozapine group, where no time 0 points were missing, 100% of the animals with hM3Dq showed an LH increase after the injection while only 28% (2/7) showed the increase in the negative control group. Rigorously, the DREADDs approach doubled the chances of LH increase. Note that the spontaneous LH peaks observed in negative controls or during baseline show a very sharp increase and decrease at the next time point, while the 4 “PMv hits” without time 0 and increase in LH in the CNO-hM3Dq group showed a sustained rise after the 10 min or prolonged high LH levels (above 1ng/ml) even 30 min after the injection. But, ultimately, the cFOS levels in the PMv of CNO-hM3Dq group with increase in LH are significantly higher than in any other group and the number of cFOS neurons are highly correlated to LH levels. Another important aspect that should not be dismissed is that in this experimental design, we used unilateral injection in animals that are in a fed state, therefore the leptin role in rising LH levels is probably dampened.

      We have added a statement to clarify this issue.

      The following are minor concerns:

      a) Figure 4 a-d, it is clear that Vglut2 is absent in the VMH, but it seems more relevant to show this expression pattern in the PMv.

      We chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the conditional Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparse. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      b) Methods section, targeting PMv: please check the injection coordinate: "dura-mater [dorsoventral -0.54]"

      Thank you for noticing this mistake, all coordinates for the injection have now been corrected (-5.4 mm, ±0.5 and -5.4mm)

      Reviewer #2 (Recommendations For The Authors):

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      My specific comments are as follows. Please note that none of them require additional experiments and that they can be answered by amending the text.

      (1) Please provide information on the serotypes and promoters of the AAVs used in the study to enhance reproducibility.

      Thank you, serotypes and promoters have been added for all AAVs.

      (2) Please reformulate lines 220-221. Indeed, this reviewer does not agree with the fact that balanopreputial separation (BPS) is a sign of puberty completion. BPS is merely a sign of the advancement of sexual maturation, akin to vaginal opening in females. In certain mouse strains, BPS coincides with mini puberty rather than puberty. The definitive sign of puberty completion involves the presence of spermatozoa in the vas deferens (equivalent to the first ovulation/first estrus in females).

      Thank you for this remark, this statement has now been modified.

      (3) The authors convincingly show that the potential contamination of the arcuate nucleus of the hypothalamus (ARH) with the AAV injections targeted to the PMv should not account for the DREADD-mediated activation of LH release. However, do the authors believe that DREADD activation of LepRb-expressing PMv neurons, inducing cFOS expression in these neurons, could also activate ARH kisspeptin neurons (which do not express LepRb) via transsynaptic action? Alternatively, do they posit direct activation of GnRH cell bodies in the preoptic region or GnRH axon/dendrites in the ARH/median eminence region?

      Thank you for this comment. We don’t have enough evidence from this DREADDs experiment to make a strong prediction on the downstream pathways. However, as discussed, from the DREADDs khrGFP females, we observed very few kisspeptin cells expressing cFOS, reducing the evidence for a PMv to ARH kisspeptin action in this case. With the evidence from our LepR-Cre;Vglut2flox animals that showed no alterations in kiss1 gene expression but a strong decrease in GnRH release, we hypothesize that this acute activation of LH is mediated by direct inputs from PMv to GnRH neurons, while acknowledging the possible existence of alternative pathways. These arguments have been added to the discussion. 

      (4) This reviewer finds it intriguing that glutamatergic signaling is required for LepRb re-expression in the PMv to restore fertility. Given that the authors and others have shown that PMv neurons heavily express NOS1, the activity of which is known to heavily rely on glutamatergic NMDAR activation, the authors may want to contextualize their results in light of the recent study showing that NOS1 is found to be a new causative gene in people with congenital hypogonadotropic hypogonadism.

      Thank you for the advice, we have added a paragraph discussing the possible involvement of nNos from PMv neurons in the discussion.

      (5) Does the absence of vGlut2 have any impact on the obesity phenotype in mice where LepRb is selectively re-expressed in the PMv?

      We have followed the weight of these animals after the AAV injections. However, due to the difficulty of generating dual homozygous (LepRnull homozygous are infertile) and producing adequate stereotaxic injections with minimum contamination of adjacent nuclei, the groups could not be run all together and thus, we refrained from performing comparative analysis of energy balance. Analysis of body weight in LepRnull mice with reactivation of LepR in PMv neurons have been published before (Donato et al., 2011 using the Flp/Frt model and Mahany et al., 2018 using the Cre/loxP system). No difference in body weight was observed in both studies. Below is the progression of body weight in mice with reactivation of LepR and deletion of Vglut2 in PMv neurons. We added a comment on this regard.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice. Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities, or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing, and support the conclusion. This reviewer has the following suggestions for the authors to further improve this work and the manuscript.

      (1) The DREADD study had some issues. For example, "2 out of 7 control mice with no AAV showed an increase in LH...", indicating that LH increase may just happen randomly. More importantly, 45% of PMv-hit mice did not show LH response to CNO, making it hard to interpret the positive LH responses from the other 55% PMv-hit mice undergoing the same treatment. Overall, there are just too many variabilities in these DREADD data for anyone to come up with a clean and convincing conclusion. This reviewer suggests repeating these experiments or removing the DREADD data altogether. After all, the rest of the results are much more convincing and stand alone to support the role of glutamate release from these PMv LepR neurons.

      We appreciate the reviewer’s concern. Indeed, LH shows spontaneous pulsatility which is one of the biggest challenges in our field. We have answered this concern for Reviewer 1 above and modified the text accordingly. We decided to keep the data in the publication because we believe that this is very important evidence supporting our observations since this is the only experiment that approaches the role of the PMv in a free-moving, ad libitum fed mouse model that is not deficient for leptin signaling or glutamatergic neurotransmission. Altogether this paper strongly supports a role for glutamate signaling on leptin’s action in reproductive function. Evidence for this role were dismissive or contentious until now.

      (2) The mCherry signals in Figure 3 are of low quality and do not look like cell bodies.

      We have now equally increased the contrast and brightness in all higher magnification images of mCherry neurons (Fig 3F, G, I and J) to improve their visibility. The lower magnification images are high quality images of areas with high density of mCherry positive neurons. Thick section (30µm) at low magnification compromises the focus at different Z-axis levels. We feel that images 3E and 3H are important to define the location of cells in the arcuate nucleus. Colocalization and mCherry expression are clear in high magnification images.

      (3) The validation of Vglut2 deletion in LepR neurons (Fig. 4A-D) is very nice and convincing, but the images are from the VMH region. Why not show the PMv region?

      As mentioned to Reviewer 1, we chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparce. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      (4) Figures 4-5 used LepR-Cre as controls, while Figure 6 used Vglut2flox as controls. Why? Also, how did the authors set up the breedings to generate "littermates" in each of these studies?

      We used the LepR-Cre as controls for our experiments since we need Cre homozygous for proper Cre expression and we had the LepR-Cre homozygous colony from the DREADDs experiment. Also, these mice had previously been thoroughly evaluated and no metabolic and/or reproductive disruption were noticed (please, see lines 213-214 of the original submission). However, our LepR-Cre colony had to be drastically reduced during COVID and suffered from unexpected Δ recombination leading to loss of Vglut2 homozygotes. To overcome these issues, we used VGlut2-floxed controls for the gene expression and GnRH immunoreactivity experiments. These mice had previously been used as controls for metabolic experiments with the LepCre-Vglut2fl genotype (Xu et al., 2013 Mol Metab), showing no deficiencies in the metabolic phenotype.

      As described in the methods section (lines 464-466 of the original preprint), to inactivate glutamate in leptin responsive cells, LepRb-Cre mice were crossed with mice carrying loxP-modified Vglut2 alleles. Our experimental mice were homozygous for the LepRb-Cre allele (LepRb_cre/cre_) and homozygous for the Vglut2-loxP allele (Vglut2_fl/fl_). Our controls consisted of mice homozygous for the Cre allele (LepRb_cre/cre_;Vglut2_+/+, named LepRb-Cre) or homozygous for the Vglut2-loxP allele (LepRb+/+;Vglut2_fl/fl, named Vglut2_flox_). Both experimental (LepRb_cre/cre_;Vglut2_fl/fl_, named LepRbΔVglut2) and control mice were derived from the same litters with parents homozygous for one of the genes and heterozygous for the other gene (LepRb_cre/cre_;Vglut2_fl/+or LepRb_cre/+;Vglut2_fl/fl_). Mice were genotyped at weaning (21 days) and again at the end of the experiments.

      (5) The labeling of Figures 5E-F is missing, making it hard to read.

      We have confirmed that Figure 5E and F were mentioned in the figure legends and in the results text. To improve the analysis of the figure we have added the Y axis titles to Figure 5 C,D, E and F, previously only shown in Fig 5A and B.

      (6) The last experiment was very nice confirming the role of glutamate release from PMv LepR neurons. However, the key phenotypes (puberty development, pregnancy) were not graphed and only stated in the text.

      Thank you for your comment. Since the key result is that none the LeprLoxTb;Vglut2flox animals showed vaginal opening or pregnancy, we don’t feel the need to graph this. All the details of the reproductive and metabolic phenotyping of the Lepr-loxTB with re-expression of LepR in the PMV were described in Mahany et al., 2018.

    2. Reviewer #1 (Public Review):

      Summary:

      In previous work the Elias group has shown that leptin sensing PMv neurons make connections with the neuroendocrine reproductive axis and are involved in reproductive function/s. Sáenz de Miera et al. build on this body of work to investigate the sufficiency of leptin sensing PMv neurons to evoke the release of luteinizing hormone. The team further investigates how glutamate signaling from leptin-sensing neurons can influence pubertal timing in females, along with mature estrous cycles. Genetic ablation of Slc17a6 (Vglut2) from LepRb-expressing cells resulted in a delay of the first estrus cycle post pubertal transition, along with a significantly lengthened estrous cycle in mature females. However, this deficit did not lengthen the latency to birth of the first litter in experimental dams. Restoration of leptin signaling in LepRb PMv neurons that was previously shown to induce puberty and instate reproductive function in LepRb knock-out female mice (Mahany et al., 2018). Here, Sáenz de Miera et al. use a combined genetic and viral strategy to demonstrate that glutamate signaling in LepRb PMv neurons is required for sexual maturation in LepRb knock-out female mice.

      Strengths:

      Most of the experiments performed in this manuscript are well justified and rigorously tested. The genetic method to simultaneously remove glutamate signaling and restore the leptin receptor in LepRb PMv neurons was well executed and showed that glutamate signaling in LepRb PMv neurons is necessary for leptin-dependent fertility.

      Weaknesses:

      Analysis of experimentally induced luteinizing hormone release could be confounded by spontaneous pulses of luteinizing hormone that are independent of LepRb PMv neurons.

    3. Reviewer #2 (Public Review):

      Summary:

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      Strengths:

      The authors employ state-of-the-art methods and their conclusions are robustly supported by the results.

      Weaknesses:

      None identified. Only minor comments have been formulated.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice.

      Strengths:

      Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing and support the conclusion.

      Comments on revised version:

      The authors have addressed most of my comments.

    1. eLife assessment

      The findings of this study are valuable as they challenge the dogma regarding the link between lowered bacterial metabolism and tolerance to aminoglycosides. The authors propose that the well-known tolerance to AG of mutants such as those of complexes I and II is not due to a decrease in the proton motive force and thus antibiotic uptake. The results presented here are convincing.

    2. Reviewer #2 (Public Review):

      Summary:

      This interesting study challenges the dogma regarding the link between bacterial metabolism decrease and tolerance to aminoglycosides (AG). The authors demonstrate that mutants well-known for being tolerant to AG, such as those of complexes I and II, are not so due to a decrease in the proton motive force (PMF) and thus antibiotic uptake, as previously reported in the literature.

      Strengths:

      This is a complete study that employs several read-outs.

      In this revised version, the authors have carefully addressed all the reviewers' comments. I appreciate the effort made in this new version to clarify that this study does not refute the PMF-dependent mechanism of aminoglycoside uptake (in the discussion_ lines 731-734_).

      The addition of the requested experiments using lower concentrations of aminoglycosides is a considerable improvement as it allows for comparison with previously published results.

    1. eLife assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis in a murine model. They provide solid evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The overall presentation of the manuscript and logical flow requires improvement and the work will be of interest to infectious disease researchers.

    2. Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Strengths:

      An extensive study on probiotic property of the Bacillus velezensis strain HBXN2020

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have potential benefit to serve as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).<br /> (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.<br /> (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.<br /> (4) Next, the authors tested the ability to HBXN2020 to inhibit growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.<br /> (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.<br /> (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.<br /> (2) Most observations are supported using multiple approaches.<br /> (3) Mouse experiments are very convincing.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there no investigation of the mechanism that underpins this.<br /> (2) Mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores that current gold standard for treatment.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Weaknesses:

      Few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis to prevent colitis in a mouse model. They provide solid evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The work will be of interest to infectious disease researchers and those studying the microbiome.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      An extensive study on the probiotic properties of the Bacillus velezensis strain HBXN2020.

      Response: Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      - The main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Response: Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback.   

      - Most of the results and analysis parts are separated without a link or any story-telling to deliver a concise message.

      Response: Thank you for your comments and suggestions on our manuscript. The comments improve the quality and depth of manuscript. Based on your suggestions, we have revised modifications to the entire manuscript.

      The updated contents were presented in the revised manuscript.

      - For the Salmonella Typhimurium-induced mouse model of colitis, it is not clear how an oral infection of C57BL/6 would lead to colitis. Streptomycin is always pretreated (https://link.springer.com/protocol/10.1007/978-1-0716-1971-1_17).

      Response: Thank you very much for your reading and comments our manuscript. The S. Typhimurium ATCC14028 (STm) used in this study is a highly virulent strain. The findings of the predimed trial indicated that mice infected with 107 CFU STm exhibited notable symptoms in the absence of streptomycin pretreatment. Hence, streptomycin was not utilized as a pretreatment for mice in this study. We appreciate your review and feedback and hope that our response adequately addresses your concerns.  

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have the potential benefit of serving as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability of HBXN2020 to inhibit the growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose-dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance of colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate the effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting them with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Response: Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Response: Thanks for the comments and the positive reception of the manuscript.

      (3) The mouse experiments are very convincing.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there is no investigation of the mechanism that underpins this.

      Response: Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) The mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores with the current gold standard for treatment.

      Response: We gratefully appreciate for your valuable comments. The objective of this study is to investigate the potential of B. velezensis spores in mitigating bacterial-induced colitis. In this experiment, animal experimental design referred to the method described in previous studies with slight modifications (10.1038/s41467-019-13727-9, 10.1126/scitranslmed.abf4692). We appreciate your review and feedback. We hope that our response adequately addresses your concerns.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation. Overall, the manuscript is of potential interest to readers.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      There are few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Response: Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract:

      It is quite wordy, without a clear emphasis on the major point of the study. It is obvious how the host-probiotic-microbiota behaves and why it works out well, which is the key part.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have modified this in the revised manuscript as suggested.

      The updated contents were presented in line 30-32, 34-39 and 41-46 in abstract section of the revised manuscript.

      Please remove "novel", Many previous works have already documented the probiotic Bacillus velezensis. It is also NOT novel species...

      Response: Thank you for your suggestion. We have corrected it as suggested. Please see line 26 in abstract section of the revised manuscript.

      Lines 44-46. The way this conclusion is delivered is inappropriate; it should be clarified exactly according to the supported results.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 44-46 in abstract section of the revised manuscript.

      Introduction:

      Lines 71-71, Lines 75-77, Line 92 "the homeostasis of", please remove.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 96 in introduction section of the revised manuscript.

      Are the Salmonella loads the key indicator for this model?

      Response: We gratefully appreciate for your valuable comments. In this study, we aimed to evaluate whether B. velezensis can alleviate S. Typhimurium-induced colitis in mice. It has been reported that S. Typhimurium enters the intestine, colonizes and proliferates in the intestinal epithelium, and then breaks through the intestinal barrier to reach the whole body with the blood circulation system, leading to systemic infection. Thereby, the load of Salmonella in the intestine and tissue organs is also one of the key indicators reflecting Salmonella infection. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The introduction should really focus on the knowledge gap in general and in a specific field, which is not available in the current version.

      Response: Thank you for your valuable suggestion. The comments improve the depth of the manuscript. We have corrected it as suggested.

      The updated contents were presented in line 53-57, 61-64, 69-75, 85-88 and 97-100 in introduction section of the revised manuscript.

      Results:

      "Genomic Characteristics" of B. velezensis HBXN2020 are separated. There are no links between this work for safety and probiotic effects.

      Response: Thank you for your suggestion. Based on your suggestion, we have revised modifications to the "genomic characteristics" in the results section. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Are the AMR and virulent genes available on the chromosome? Is there any gene cluster that codes useful stuff that is linked to probiotic efficacy in vitro and in vivo?

      Response:  Thanks for your suggestion. The comments improve the quality and depth of manuscript. In this study, the HBXN2020 genome contains fragments of AMR and virulence genes. However, the results of antibiotic sensitivity test and safety test showed that HBXN2020 did not exhibit resistance and toxicity. Furthermore, the HBXN2020 genome contains 13 different clusters of secondary metabolic synthesis genes. such as surfactin (genomic position: 323,509), macrolactin H (genomic position: 1,384,185), bacillaene (genomic position: 1,691,549), fengycin (genomic position: 1,865,856), difficidin (genomic position: 2,270,091), bacillibactin (genomic position: 3,000,977) and Bacilysin (genomic position: 3,589,078) (Table S2). These secondary metabolites have been shown to have varying degrees of inhibition on fungi (10.3390/foods11020140), Gram-positive pathogens (10.1371/journal.pone.0251514) and Gram-negative pathogens (10.1007/s00253-017-8095-x). We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 108-110 in results section of the revised manuscript and supplementary Table 2 in the revised supplemental material.

      Finally, the raw data (Illumina, Pacbio) should also be provided.

      Response: Thanks for pointing this out. According to your suggestion, we have submitted the raw data of the HBXN2020 genome to the GenBank database, GenBank accession number CP119399.1. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      Lines 100-108, please replace this part for a more meaningful investigation that could be possibly supported by the following experimental assays.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we try our best to remove some minor results and supplement more meaningful research findings. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Lines 119-126, which are not important, did you further check what or which parts make the bacteriostasis?

      Response: Thanks for pointing this out. According to your suggestion, we try our best to remove some minor results by removing unnecessary words and sentences. Furthermore, in the following research, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.   

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      "Biosafety"? Is there a standard way to conduct this investigation? please clarify.

      Response: Thank you for pointing out this problem in manuscript. In this experiment, Biosafety assessment of B. velezensis HBXN2020 referred to the method described by Zhou et al. with slight modifications (10.1038/s41467-022-31171-0). We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 651-652 in results section of the revised manuscript.

      Why are spores used, not whole bacteria? Please clarify.

      Response: Thanks for pointing this out. We apologize for any incomprehension caused by the use of B. velezensis HBXN2020 spores in manuscript. In this study, mice were treated with B. velezensis by oral gavage, while gastric acid will drastically reduce the activity of B. velezensis. However, spores tolerated strong acidic environments well. Additionally, previous studies have also precedents of using spores (10.1126/scitranslmed.abf4692). Thank you for your comments and feedback and hope that our response adequately addresses your concerns.

      Line 196, line 287, repeated assays were conducted, but the logical link is missing.

      Response: We gratefully appreciate for your valuable comments. We apologize for any inconvenience caused by the organization and coherence of our results section. According to your suggestion, we try our best to improve the manuscript's layout by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 195-198, 246-248, 256-257 and 285-287 in results section of the revised manuscript.

      Discussion:

      Please shorten it; it is wordy but without focus.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we try our best to shorten the discussion length by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 353-355, 358-360, 366-371, 381-385, 395-401, 417-419, 430-438, 459-466, 478-481 and 484-485 in discussion section of the revised manuscript.

      Conclusion:

      Please clarify and rework it.

      Response: Thanks for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have now rewritten the conclusion.

      The updated contents were presented in line 492-496 in conclusion section of the revised manuscript.

      Materials and Methods:

      Much more detailed information should be provided.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised detailed modifications to the experimental method. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 513-515, 530-533 and Supplementary Table 5 in revised manuscript and supplemental material.

      All previous bacterial sampling and a list of results should be provided as the supplemental document.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript. In this study, we conducted preliminary biological activity testing on 362 isolates of Bacillus against pathogenic bacteria, which included S. Typhimurium ATCC14028, E. coli ATCC35150, S. aureus ATCC43300 and ATCC29213. We found that the antagonistic activity of four strains of BacillusB. subtilis H1, B. velezensis HBXN2020, B. amyloliquefaciens 6-1 and B. licheniformis BSK14)against these pathogenic bacteria, while the rest have no significant activity. So we chose these four strains to further evaluate their antibacterial activity against Gram-negative and Gram-positive pathogens (Supplementary Table 5). Based on the antibacterial test results, we found that B. velezensis HBXN2020 strain had the best antibacterial activity. so we chose B. velezensis HBXN2020 for subsequent experiments. 

      The updated contents were presented in Supplementary Table 5 in supplemental material.

      Minor points:

      All bacterial genera and species should be italicized.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      Line 39, remove repeated "importantly"

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 39 in abstract section of the revised manuscript.

      Lines 55-56, please rewrite.

      Response: Thanks for your suggestion. We have now rephrased the sentence.  

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      The relevant references should be updated, in the right format.

      Response: Thanks for your suggestion. Based on your suggestion, we have revised modifications according to the literature format of eLife magazine.

      The updated contents were presented in reference section of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) In Figure 2, the authors make the argument that the increased survival of Bacillus spores at high temperatures and low pH renders the strain useful as a probiotic as it would survive in the gut. However, the gut temperature is not significantly higher than the rest of the body (certainly not 95 degrees). One assumes the pH argument applies to surviving in stomach acid so that spores can travel to the gut. These conclusions should be clarified/revised. The survival in bile salts gastric fluid etc makes more sense.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised these conclusions. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 129-132 in results section of the revised manuscript.

      (2) The overall differences in the microbiota on the stacked bar graphs are difficult to determine. In many cases, it looks like the HBXN2020 does not have a significant effect. The subsequent scattergrams are more convincing. Perhaps the authors can think of a better way to compare composite populations. If not, I suggest moving these stacked graphs to the supplementary information.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have moved stacked graphs to the supplemental material. In addition, we replaced bar graphs with heatmaps, the differences of microbial community composition among different experimental groups were evaluated using the depth of color. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see Figure 7and 10 in revised manuscript and supplemental material.

      Minor editorial:

      (1) Line 55 - "....antibiotic therapy is...".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) Line 60 - replace "emergent search" - poor syntax.

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.  

      The updated contents were presented in line 61-62 in introduction section of the revised manuscript.

      (3) Line 63 - "...play an important...".

      Response: Thanks for pointing this out. We have now rephrased the sentence.

      The updated contents were presented in line 63-64 in introduction section of the revised manuscript.

      (4) Figure 1C is not very useful, simply reinforces the data from 1A and 1B - this can be moved to the supplementary information.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript.

      Based on your suggestion, we have moved figure 1C to the supplemental material. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see figures in revised manuscript and supplemental material.

      (5) Line 126, "...that the growth of B. velezensis HBXN2020 was relatively stable." What do the authors mean by this? "Stable" implies no increase in biomass, but the growth curve does not indicate this, there was an increase in biomass after which, the culture appeared to reach a stationary phase. This should be clarified.

      Response: Thanks for pointing this out. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      (6) In Figure 5 - all the graphs in panel A can be amalgamated into one figure using different colours/symbols.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have merged all the graphics in panel A in Figure 5 into one figure.

      The updated contents were presented in Figure 5 in the revised manuscript.

      (7) The overall cohesiveness of the manuscript could be improved.

      Response: Thank you for your valuable comments. The comments improve the quality and depth of manuscript. We have revised the entire manuscript based on your suggestions. The updated contents were presented in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      There are some issues that following issues require clarification to improve the quality of the manuscript further.

      (1) L.55: Replace "antibiotic therapies" with "antibiotic therapy".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) "Bacillus" should be modified to italics in the manuscript (see e.g., L. 26, 65, 68, 109).

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      (3) The first appearance of bacterial names in the manuscript requires the full English name (see e.g., L. 158, 159, 160).

      Response: Thank you for pointing out this problem in manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 153-156 in results section of the revised manuscript.

      (4) L.166 and 167: "we evaluated its biological safety in a mouse model" suggest modifying to "we evaluated the biological safety of HBXN2020 in a mouse model".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 163-164 in results section of the revised manuscript.

      (5) L.229: Replace "suggest" with "suggested".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 226 in results section of the revised manuscript.

      (6) L.367: The tense of "can" should be consistent with "demonstrated".

      Response: Thanks for pointing this out. We have corrected this as suggested.

      (7) L.368 and L. 369: Replace "Gram positive and Gram negative" with "Gram-positive and Gram-negative".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      (8) L.372: Replace "and" with "as well as".

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 365 in discussion section of the revised manuscript.

      (9) NCBI accession number of supplementing 16SrRNA sequencing raw data.

      Response: Thank you for your suggestion. We have added it in the revised manuscript.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      (10) L. 1020 and L. 1073: It's recommended to reduce the word count in the annotations of Figures 5 and 8.

      Response: Thank you for your valuable suggestion. We have corrected it as suggested.

      The updated contents were presented in the annotations of Figure 5 and Figure 8 in figure legends section of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors a substantial impact on aging and brain structure and function.

      Weaknesses:

      (1) Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of the two groups are not obvious and lack further details. Can they also stratified by different methods? i.e. PCA?

      Response: Thanks for the comment. In this study, principal component analysis (PCA) was applied to individualized deviation of anatomic region of interest (ROI) for dimensionality reduction, which yielded the first 15 principal components explaining approximately 70% of the total variations for identifying longitudinal brain aging patterns. These two patterns can be stratified by both linear and non-linear dimensionality reduction methods: PCA and locally linear embedding (LLE)1. The grey matter volume (GMV) of 40 ROIs at baseline were linearly adjusted for sex, assessment center, handedness, ethnic, intracranial volume (ICV), and second-degree polynomial in age to be consistent with the whole-brain GMV trajectory model. There was a clear boundary between two patterns in the projected coordinate space, indicating distinct structural differences in brain aging between the two patterns (Author response image 1).

      Author response image 1.

      Stratification of the identified brain aging patterns using linear and non-linear dimensionality reduction methods. (a) The principal component space of PC1 and PC2, and (b) two-dimensional projected locally linear embedding space derived from brain volumetric measures. Points have been colored and shaped according to grouping labels of the brain aging patterns.

      (2) Are there any external data that can be used for validation?

      Response: Thanks for the comment. We were given access to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which aimed at determining the relationships between clinical, cognitive, imaging, genetic, and biochemical biomarkers across the entire spectrum of Alzheimer’s disease. ADNI recruits participants aged between 55 and 90 years at 57 sites in the United States and Canada, who undergo a series of initial tests that are repeated at intervals over subsequent years. 

      Unfortunately, there are no appropriate and sufficient data, especially clinical, cognitive, and genetic data, to support unbiased validation of the heterogeneity in structural brain aging patterns. Only 890 (31.83%) of the 2796 subjects included in the ADNI were cognitively normal, of which 656 were included in the analyses after quality control of structural MRI and exclusion of missing covariate, with a mean age at the screen visit of 70.8 years (SD = 6.48 years), and 60.21% of the subjects were female. Thus, there are significant differences between ADNI and UK Biobank in terms of the population composition, with ADNI collecting more older subjects due to its focus on defining the progression of Alzheimer’s disease.

      Moreover, among 656 subjects with structural imaging data, the dataset used to validate the clinical, cognitive, and genetic manifestations of the brain aging patterns were missing to varying degrees. For example, blood biochemistry tests and telomere length data were missing at baseline by approximately 58% and 82% respectively, and genotype data were not assayed for more than 70 percent of the subjects. As for cognitive function tests, only the results of Mini-Mental State Examination were complete, while other tests such as the Trail Making Test and Digit Span Backward were available for less than 10 percent of subjects. 

      (3) Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Response: Thanks for the suggestion. As we mentioned in the manuscript lines 274-277, participants with brain aging pattern 2 (lower baseline total GMV and more rapid GMV decrease) were characterized by accelerated biological aging and cognitive decline. Previous research on brainAGE2,3 (the difference between chronological age and the age predicted by the machine learning model of brain imaging data) showed that as a biomarker of accelerated brain aging, people with older brainAGE have accelerated biological aging and early signs of cognitive decline, which is consistent with our discoveries in this study (lines 302-306).

      Further, genome-wide association studies identified significant genetic loci contributing to accelerated brain aging, some of which can be found in pervious GWAS on image-derived phenotypes4, such as regional and tissue volume, cortical area and white matter tract measurements, and specific brain aging mode using a data-driven decomposition approach5 (lines 207-213).

      In addition, we demonstrated the “last in, first out” mirroring patterns between structural brain aging and brain development, and found that mirroring patterns are predominantly localized to the lateral / medial temporal cortex and the cingulate cortex, noted in the manuscript lines 231-234. Large differences in the patterns of change between adolescent late development and aging in the medial temporal cortex were previously found in studies of  brain development and aging patterns6 (lines 315-317).

      (4) Sex was merely used as a covariate. Were there sex differences during brain aging? What was the sex ratio difference in groups 1 and 2?

      Thanks for the comment. Sex differences during brain aging can be observed by investigating sex-stratified whole-brain GMV trajectories. We fitted the growth curve and estimated rate of change for total grey matter volume (TGMV) separately for male and female using generalized additive mixed effect models (GAMM), which included 40,921 observations from 17,055 males and 19,958 females (Author response image 2). Overall, among healthy participants aged 44-82 years in UK Biobank, males overall had higher total GMV and a faster rate of GMV decrease over time, while females had lower total GMV and a lower rate of GMV decrease. Similar conclusion can be found in normative brain-volume trajectories across the human lifespan7 . Supplementary Table 5 showed baseline and demographic characteristics for all participants and participants stratified by brain aging patterns. There were slightly more females than males among the total participants and for brain aging pattern 1 (53.4%) and pattern 2 (54.4%), and χ^2 tests showed no significant difference in the sex ratio between the two patterns (P = 0.06).

      Author response image 2.

      Total gray matter volume (TGMV) (a) and the estimated rate of change (b) for females (red) and males (blue). Rates of volumetric change for total gray matter and each ROI were estimated using GAMM, which incorporates both cross-sectional between-subject variation and longitudinal withinsubject variation from 22,067 observations for 19,958 females, and 18,854 observations for 17,055 males. Covariates include assessment center, handedness, ethnic, and ICV. Shaded areas around the fit line denotes 95% CI.

      (5) Although statistically significant, Figure 3 shows minimal differences. LTL and phenoAge are displayed in adjusted values but what are the actual values that differ between patterns 1 and 2?

      Response: Thanks for the comment. We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot. Associations between biological aging biomarkers and brain aging patterns were listed in Supplementary Table 6. Compared to brain aging pattern 1, participants in pattern 2 with more rapid GMV decrease had shorter leucocyte telomere

      length (P = 0.009, Cohen’s D = -0.028) and higher PhenoAge (P = 0.019, Cohen’s D = 0.027) without covariate adjustment. Specifically, participants in brain aging pattern 1 had average Z-standardized LTL 0.083 (SD 0.98) and average PhenoAge 41.35 years (SD 8.17 years), and those in pattern 2 had average Z-standardized LTL 0.055 (SD 0.97) and average PhenoAge 41.58 years (SD 8.32 years).

      (6) It is not intuitive to link gene expression results shown in Figure 8 and brain structure and functional differences between patterns 1 and 2. Any overlap of genes identified from analyses shown in Figure 6 (GWAS) and 8 (gene expression)?

      Response: Thanks for the comment. We apologize for the confusion. As we mentioned in the Result Section Gene expression profiles were associated with delayed brain development and accelerated brain aging, seventeen of the 45 genes mapped to GWAS significant SNP were found in Allen Human Brain Atlas (AHBA) dataset. Gene expression of LGR4 (rspearman = 0.56, Ppermutation = 2.5 × 10-4) were significantly associated with delayed brain development, and ESR1 (rspearman = 0.53, Ppermutation = 1.5 × 10-4) and FAM3C (rspearman = -0.37, Ppermutation = 0.004) were significantly associated with accelerated brain aging. BDNF-AS was positively associated with both delayed brain development and accelerated brain aging after spatial permutation test. Full association between gene expression profiles of mapped genes and estimated APC during brain development / aging were presented in Supplementary Tables 12 and 13, respectively.  

      Furthermore, we screened the genes based on their contributions and effect directions to the first PLS components in brain development and brain aging. We have found genes mapped to GWAS significant SNP among the genes screened for inclusion in the functional enrichment analysis (Author response table 1), with LGR4 (PLSw1(LGR4) = 3.70, P.FDR = 0.002) associated with delayed development and ESR1 (PLSw1(ESR1) = 3.91, P.FDR = 6.12 × 10-4) and FAM3C (PLSw1(FAM3C) = -3.68, P.FDR = 0.001) associated with accelerated aging.

      Author response table 1.

      Contributions and effect directions of the first PLS components in brain development and brain aging of genes that mapped to GWAS significant SNP. The bold P values reflect significance (P < 0.005, inclusion in the functional enrichment analysis) after FDR correction.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      There appears to be a lack of connection between the analysis results and their claims. Readers lacking sufficient background knowledge of the brain may find it difficult to understand the paper. It would be beneficial to modify the figures and writing to make the authors' claims clearer to readers. Furthermore, the paper gives an overall impression of being less polished in terms of abbreviations, figure numbering, etc. These aspects should be revised to make the paper easier for readers to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Gray matter volume (GMV) is defined later in the manuscript and may confuse readers.

      Response: Thanks for the comment. We have now defined GMV upon its first appearance in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In conducting GWAS, the authors used total GMV at the age of 60 as a phenotype (line 195). It would be beneficial to provide additional explanation as to why only the data from individuals aged 60 were utilized, especially considering the ample availability of GMV data.

      Response: Thanks for the comment and we apologize for the confusion. As we mentioned in the Methods Section Genome Wide Association Study to identify SNPs associated with brain aging patterns, we performed Genome-wide association studies (GWAS) on individual deviations of total GMV relative to the population average at 60 years using PLINK 2.0. Therefore, data from all individuals were used in the GWAS, rather than only those aged at 60y. To accomplish this, deviation of total GMV from the population average for each participant at age 60y was calculated using mixed effect regression model as described in the Methods Section Identification of longitudinal brain aging patterns.

      (2) Whole-brain gene expression data was linked to GMV (Line 237). Gray matter is known to account for about 40% of the total brain. Thus, interpreting whole-brain data in connection with GMV might introduce significant errors. Could this potential source of error be addressed?

      Response: Thanks for the comment. In our study, the Allen Human Brain Atlas (AHBA) dataset were processed using abagen toolbox version 0.1.3 (https://doi.org/10.5281/zenodo.5129257) with Desikan-Killiany atlas8, resulting in a matrix (83 regions × 15,633 gene expression levels) of transcriptional level values that contains brain structure of cortex and subcortex in bilateral hemispheres, and brainstem. Only data from 34 cerebral cortex regions, but not the whole brain, were included in the analysis of the association between regional change rate of gray matter volume and gene expression profiles using partial least squares (PLS) regression. We have clarified in the revised manuscript that we utilized AHBA microarray expression data from regions of interest (ROIs) in the cortex.

      (3) The paper lacks biological interpretation of the important genetic factors (SNPs and genes) for brain aging discovered in this study, as well as the results of gene ontology analysis. Many readers would be curious about the biological significance of these genetic differences and what kind of outcomes they may produce.

      Response: Thanks for the suggestion. As we mentioned in our manuscript, six independent single nucleotide polymorphisms (SNPs) were identified at genome-wide significance level (P < 5 ×1 0-8) (Fig. 6). Among them, two SNPs (rs10835187 and rs779233904) were also found to be associated with multiple brain imaging phenotypes in previous studies, such as regional and tissue volume, cortical area and white matter tract measurements. Compared to the GWAS using global gray matter volume as the phenotype, our GWAS revealed additional signal in chromosome 7 (rs7776725), which was mapped to the intron of FAM3C and encodes a secreted protein involved in pancreatic cancer and Alzheimer's disease. This signal was further validated to be associated with specific brain aging mode by another study using a data-driven decomposition approach. In addition, another significant locus (rs10835187, P = 1.11 ×1 0-13) is an intergenic variant between gene LGR4-AS1 and LIN7C, and was reported to be associated with bone density, and brain volume and total cortical area measurements. LIN7C encodes the Lin-7C protein, which is involved in the localization and stabilization of ion channels in polarized cells, such as neurons and epithelial cell. Previous study has revealed the association of both allelic and haplotypic variations in the LIN7C gene with ADHD. In addition, ESR1 was found to be involved in I-kappaB kinase/NF-kappaB signaling in the functional enrichment associated with accelerated brain aging (Figure 8 and Supplementary Figure 5), and its activation leads to a variety of human pathologies such as neurodegenerative, inflammatory, autoimmune and cancerous disease9. 

      In summary, the analyses from using the databases of GO biological processes and KEGG Pathways indicate synaptic transmission as an important process in the common mechanisms of brain development and aging, and cellular processes (autophagy), as well as the progression of neurodegenerative diseases, are important processes in the mechanisms of brain aging.

      (4) As mentioned in the public review, it would be helpful if figures were revised to more clearly represent the claims.

      (4.1) For Figure 1, it would be beneficial to explain how the authors analyzed the differences between the mentioned cross-section and longitudinal trajectory, which they identified as a strength of the study.

      Response: We have added the strengths of adopting longitudinal data for modeling brain aging trajectories compared to only using cross-sectional data in Figure 1 caption in the revised manuscript:

      “Fig. 1 Overview of the study workflow. a, Population cohorts (UK Biobank and IMAGEN) and data sources (brain imaging, biological aging biomarkers, cognitive functions, genomic data) involved in this study. b, Brain aging patterns were identified using longitudinal trajectories of the whole brain GMV, which enabled the capturing of long-term and individualized variations compared to only use cross-sectional data, and associations between brain aging patterns and other measurements (biological aging, cognitive functions and PRS of major neuropsychiatric disorders) were investigated. c, Mirroring patterns between brain aging and brain development was investigated using ztransformed brain volumetric change map and gene expression analysis.”

      (4.2) In Figure 3, it's challenging to distinguish differences between patterns 1 and 2 in LTL and PhenoAge. (e.g. It's unclear whether Pattern 1 is higher or lower). Clarifying this visually would be useful.

      Response: We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot.

      Author response image 3.

      Distributions of biological aging biomarkers (leucocyte telomere length (LTL) and PhenoAge) among participants with brain aging patterns 1 and 2.

      (4.3) Figure 7 explains the mirroring pattern, but it's hard to discern significant differences from the figures alone (especially in Figures 7b and 7c). Using an alternative method (graph, etc.) to clearly represent this would be appreciated.

      Response: We have included an arrow pointing to the brain regions with significant differences in each subfigure.

      Author response image 4.

      The “last in, first out” mirroring patterns between brain development and brain aging.

      (5) Abbreviations should be explained when they are first introduced in the paper. For example, GMV continues to be used without explanation, and in line 203, it is written out as 'gray matter volume'. ADHD and ASD first appear at line 172, but the explanation is found in lines 177-178. Additionally, there are terms without explanations in the manuscript. For instance, BMI is not explained in the main manuscript but is defined in the Supplementary Information (Table S6).

      Response: We have corrected the inappropriate formatting regarding misplaced and missing abbreviations in the revised manuscript and Supplementary Information.

      (6) Figure numbers should follow the order of appearance in the paper. The first Supplementary Fig. in the manuscript is Supplementary Figure 3. It should be Supplementary Figure 1.

      Response: We have relabeled the figures with the order of appearance in the paper in the revised manuscript and Supplementary Information.

      Reference:

      (1) Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. science 290, 2323–2326 (2000).

      (2) Christman, S. et al. Accelerated brain aging predicts impaired cognitive performance and greater disability in geriatric but not midlife adult depression. Translational Psychiatry 10, 317 (2020).

      (3) Elliott, M. L. et al. Brain-age in midlife is associated with accelerated biological aging and cognitive decline in a longitudinal birth cohort. Molecular psychiatry 26, 3829–3838 (2021).

      (4) Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature neuroscience 24, 737–745 (2021).

      (5) Smith, S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. elife 9, e52677 (2020).

      (6) Tamnes, C. K. et al. Brain development and aging: overlapping and unique patterns of change. Neuroimage 68, 63–74 (2013).

      (7) Bethlehem, R. A. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).

      (8) Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).

      (9) Singh, S. & Singh, T. G. Role of nuclear factor kappa B (NF-κB) signalling in neurodegenerative diseases: an mechanistic approach. Current Neuropharmacology 18, 918–935 (2020).

    2. eLife assessment

      Duan et al analyzed brain imaging data in UKBK and divided structural brain aging into two groups, revealing that one group is more vulnerable to aging and brain-related diseases compared to the other group. Such subtyping could be valuable and utilized in predicting and diagnosing cognitive decline and neurodegenerative brain disorders in the future. This discovery, supported by solid evidence, harbors a substantial impacts in aging and brain structure and function.

    3. Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors substantial impacts in aging and brain structure and function.

      Weaknesses:

      Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of two groups are not obvious and lack further details. Can they also stratified by different method? i.e. PCA?

      Any external data can be used for validation?

      Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Sex was merely used as a covariate. Were there sex-differences during brain aging? Sex ratio difference in group 1 and 2?

      Although statistically significant, Fig 3 shows minimal differences. LTL and phenoAge is displayed in adjusted values but what is the actual values that differ between pattern 1 and 2?

      It is not intuitive to link gene expression result shown in Fig 8 and brain structure and functional differences between pattern 1 and 2. Any overlap of genes identified from analyses shown in Fig 6 (GWAS) and 8 (gene expression)?

    4. Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      The authors provided appropriate answers to the reviewers' questions and revised the manuscript accordingly, and as a result, the paper has been edited to be more easily understood.

    1. eLife assessment

      This study presents an important dataset that captures the transition from epiblast to amnion using a novel in vitro model of human amnion formation. The supporting evidence for the authors' claims is convincing. Key strengths of the study include the efficiency and purity of the cell populations produced, a high degree of synchrony in the differentiation process, comprehensive benchmarking with single-cell data and immunocytochemistry from primate embryos, and the identification of critical markers for specific differentiation phases. A notable limitation, however, is the model's exclusion of other embryonic tissues.

    2. Reviewer #2 (Public Review):

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated.

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.

    3. Reviewer #3 (Public Review):

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.

      [Editors' note: In the revised manuscript, the authors have added new results and made textual revisions that address the reviewers' concerns. These changes have significantly enhanced the clarity, quality, and impact of the study. ]

    4. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the reviewers for their insightful comments, which have helped to improve the manuscript. We provide specific examples and a point-by-point response to all comments, below. Based on the Reviewers’ comments, we revised our manuscript, adding considerable amount of new data (found in Fig. 1A,B, 4E-G, 7C,D, 8C,E, S1B,C, S2C-G, S4C, and Video 1). In the main manuscript text, blue fonts indicate added or revised texts. An additional author (Lauren N. Juga) is added for the newly generated data in the revised manuscript.

      Reviewer #1: 

      Sekulovski et al present an interesting and timely manuscript describing the temporal transition from epiblast to amnion. The manuscript builds on their previous work describing this process using stem cell models. 

      They suggest a multi-step process initiated by BMP induction of GATA3, followed by expression of TFAP2A, followed by ISL1/HAND1 in parallel with loss of pluripotency markers. This transition was reproduced through IF analysis of CS6/7 NHP embryo. 

      There are significant similarities in the expression of trophectoderm and the amnion. There are also ample manuscripts showing trophoblast induction following BMP stimulation of primed pluripotent stem cells. The authors should ensure that the amnion indeed is only amnion and not trophectoderm (or the amount of contribution to trophectoderm). As an extension, does the amnion character remain after the 48h BMP4 treatment, and is a trophectoderm-like state adopted as suggested by Ohgushi et al 2022?  

      Thank you for this insightful comment. As pointed out, Ohgushi et al. showed that, in their culture method, amnion is first induced, and extended culturing leads to the formation of trophectoderm-like cells (Ohgushi et al., 2022).

      Importantly, we would like to note that our culture system differs substantially from that of Ohgushi et al. in several respects. First our system uses a 3D culture method while Ohgushi et al. employ 2D hPSC monolayers. Second, the two systems are chemically quite distinct. In our Glass-3D+BMP protocol, cells are cultured in mTeSR media (which contains FGF2 and TGFb1) for two days, by which time they generate 3D pluripotent cysts. BMP is then added to the culture medium for 24 hours, followed by another 24 hours without BMP4. In stark contrast, Ohgushi et al. employ A83-01, an Activin/Nodal signaling inhibitor, and PD173074, an FGF signaling inhibitor (a protocol which they call AP). This treatment leads to spontaneous activation of BMP signaling, but it also clearly inhibits Activin/Nodal and FGF signaling pathways, which remain active in our system. As a result of these distinct chemical as well as geometrical culturing protocols, their system produces amnion and trophectoderm, while our system produces exclusively amnion.

      Further analysis of gene expression data provides additional data supporting our contention that our system produces amnion. Though the gene expression profiles of amnion and trophectoderm are quite similar, specific markers of trophectoderm have been identified including GCM1, PSG1, PSG4 and CGB (Blakeley et al., 2015; Meistermann et al., 2021; Ohgushi et al., 2022; Okae et al., 2018; Petropoulos et al., 2016; Yabe et al., 2016). Importantly, while all of these markers are abundantly expressed in the Ohgushi et al. system, bulk RNA sequencing analysis of our Glass-3D+BMP hPSC-amnion cells reveals that none of these markers are detectable. Indeed, SDC1, a marker that Ohgushi et al. claim distinguishes trophoblast from amnion actually decreases (more than 8-fold) as pluripotent cysts transition to amnion in Glass3D+BMP. Finally, Ohgushi et al. report that ISL1, a key marker of specified amnion population, is initially increased in their system, but is reduced to a basal level overtime. In contrast, in Glass3D+BMP hPSC-amnion, ISL1 expression continuously increases with time, and ISL1 protein expression is seen uniformly throughout the amnion cysts. This uniform expression is also seen in CS6/7 cynomolgus macaque amnion. Together, these results support out conclusion that the Glass-3D+BMP system leads to the formation of amniotic cells, and not trophectoderm cells.

      The functional data does not support a direct function of GATA3 prior to TFAP2A and the authors suggest compensatory mechanisms from other GATAs. If so, which GATAs are expressed in this system, with and without GATA3 targeting? Would it not be equally likely that the other early genes could be the key drivers of amnion initiation, such as ID2? 

      We appreciate this helpful comment. We agree that our data do not provide sufficient evidence for the role of GATA3 in early amniogenesis. We also agree that other early genes could be key drivers, and apologize for including our speculation that focuses only on GATA2. GATA2 was selected because, among the other GATAs, GATA2 and GATA3 are the only abundantly expressed GATA factors. This point suggesting a potentially redundant role of GATA2 is now removed from the manuscript (Line#355 of the original manuscript).

      The targeting of TFAP2A displays a very interesting phenotype which suggests that amnion and streak share an initial trajectory but where TFAP2A is necessary to adopt amnion fate. It would again be important to ensure that this alternative fate is indeed in streak and not misannotated alternative lineages, including trophoblast. 

      Is TBXT induced in this setting as well as in the wt situation during amnion induction? This should be displayed as in Figure 3D and would be nice to be complimented by NHP IF analysis.

      We will address these two closely related comments together.

      TFAP2A-KO cysts contain ISL1+ squamous cells as well as SOX2+ pluripotent cells, suggesting that, while the initial focal amniogenesis is seen, subsequent spreading event is not seen. Interestingly, our new data show that TFAP2A-KO cysts display cells with high TBXT expression (Fig. 8E, Line#373-374). This result suggests that, in the absence of TFAP2A, once amnion lineage progression is halted, more primitive streak-like (TBXThigh) lineage emerges. It is important to note that TBXT expression is not seen in the trophectoderm population of cynomolgus macaque peri-gastrula (Sasaki et al., 2016; Yang et al., 2021).

      As suggested, we now include a TBXT expression time course during hPSC-amnion formation in Fig. S2D of the revised manuscript. These data show weak TBXT expression (transcripts) starting at the 24-hr timepoint. However, a clear TBXT protein signal could not be detected using IF (Fig. S2C), likely because TBXT expression is very low (Line#264-265). While statistically significant compared to the 12-hr timepoint, TBXT expression is 31 FPKM +/- 0.8 (standard deviation) at 24-hr and 48 FPKM +/- 6 at 48-hr. These are low expression values compared to, for example, TFAP2A, which displays 572 FPKM +/- 23 at 12-hr and 1169 FPKM +/- 27 at 24-hr, at which TFAP2A is readily detected using IF. While weak nuclear TFAP2A is seen using IF at 6hr (187 FPKM +/- 7), no clear TFAP2A is detected at 3-hr (74 FPKM +/- 7). Another example is ISL1, which displays 758 FPKM +/- 55 at 24-hr and 1505 FPKM +/- 26 at 48-hr, when ISL can be detected using IF. Importantly, we were not able to detect ISL1 protein expression using IF at

      12-hr, at which its expression level is 12 FPKM +/-18. Lastly, we now show that, in the cynomolgus macaque peri-gastrula, while pSMAD1/5+ primitive streak-derived disseminating cells show abundant TBXT expression, no clear TBXT expression is seen in the amnion territory (Fig. S2G, Line#291-293). 

      Together, these results show that while a TBXTlow state clearly emerges during hPSC-amnion development, in wild-type hPSC cultured in Glass-3D+BMP, TBXT levels remain low throughout amnion differentiation. However, in the absence of TFAP2A, a TBXThigh state is seen, suggesting that TFAP2A is critical for suppressing this TBXThigh state in fate spreading cells, perhaps by preventing BMP responding cells from acquiring embryonic lineages (e.g., mesodermal and/or primordial germ cells).

      The authors should address why they get different results from Castillo-Venzor et al 2023 DOI: 10.26508/lsa.202201706  

      Thank you very much for this helpful suggestion, and we now include a section detailing this in the Discussion (Line#410-432). In short, we propose several possibilities. First, culturing conditions are highly distinct. Castillo-Venzor et al. (Castillo-Venzor et al., 2023) utilize initial “pre-mesoderm” conditioning by Activin and CHIR, followed by treating floating embryoid bodies with a growth factor cocktail (BMP, SCF, EGF and LIF). In contrast, our system (Glass-3D+BMP) employs BMP stimulation of pluripotent cysts. Thus, we suspect that, in the PGCLC differentiation condition, cells are conditioned to the pre-mesodermal lineage. Moreover, we propose that amnion fate spreading may not be present in the PGCLC system, perhaps due to differences in geometry (aggregates versus cysts), or due to differing lineage commitment programs. That is, while initial amniogenesis is seen in the PGCLC system, most cells may already be committed to the PGC-like or mesodermal lineages by the time amnion fate spreading can occur. Alternatively, because several cell types (PGC-like, mesodermal and amniotic) co-exist in the culture by Castillo-Venzor et al., PGC-like and/or mesodermal cells may compensate for the loss of TFAP2A.

      Reviewer #2: 

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated. 

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.  

      We are happy that Reviewer #2 agrees that our Glass-3D+BMP model is important for investigating additional roles of amniogenesis, as well as roles of amnion as a signaling hub, due to the purity of the amniotic cell population, and a high degree of synchrony of differentiation.

      We respectfully disagree that the absence of other embryonic tissues in the model is a weakness: rather, we believe it is a strength because this single lineage amnion model allows us to directly (and independently) investigate mechanisms underlying amnion lineage progression. For example, as noted above in our response to Reviewer #1, use of our hPSCamnion model allowed us to see a very specific and interesting phenotype in the absence of TFAP2A (reduced amnion formation and emergence of an alternative lineage), though previous findings by Castilllo-Venzor et al. concluded that amniogenesis is not affected by loss of TFAP2A. We noted that the culture method used by Castillo-Venzor et al. contains several cell types (amniotic, mesodermal and PGC-like), and that amniogenesis may be intact in that model due to compensation by the presence of these other cell types. That is, while cell-cell interactions can indeed be gleaned in culture systems with several cell types, the presence of multiple cell types and their additional signaling inputs can also confound some aspects of mechanistic investigations. We now include a paragraph in the Discussion of the revised manuscript (Line#410-432), in which we detail these ideas, and suggest that, because of the cell purity, our Glass-3D+BMP model enables robust mechanistic examinations, specifically during amnion formation.

      We address Reviewer #2’s point about bulk vs. single cell transcriptomic similarity analysis in Reviewer’s specific point #4 below. We do, however, want to note here that we have performed the same analysis using a 14-day old cynomolgus macaque peri-gastrula single cell RNA sequencing dataset generated by Yang et al. (Yang et al., 2021), and obtained a lineage trajectory (Fig. 4F, Line#265-268) similar to that seen when the Tyser et al. dataset (Tyser et al., 2021) was used (Fig. 4C).

      Importantly, while cynomolgus macaque early embryo samples are limited, we now include additional staining (Fig. S2G). 

      Reviewer #2 (Recommendations For The Authors): 

      Provide more confirmation of key findings in more than one stem cell line. 

      We now confirm key findings in the H7 human embryonic stem cell line (Fig. S1C).

      Provide stronger evidence e.g. scRNA-seq to support the existence of intermediate cells or tone down the conclusions.  

      We agree that this is a very important point. In our recent study (Sekulovski et al., 2023), we performed single cell RNA sequencing of Gel-3D, another hPSC-amnion model. In this study, we comprehensively described the transcriptome associated with the “intermediate” cell types, as well as CLDN10 as a marker of these cell types. Moreover, we now include additional data showing the molecular characteristics of the TBXTlow intermediate cells during amniogenesis in hPSC-amnion (Fig. S2C, S2D) and d14 cynomolgus macaque peri-gastrula (Fig 4G, replot of single cell RNAseq by (Yang et al., 2021), Line#264-268).

      Provide more data on the expression of DLX5 in the model. 

      We now provide a DLX5 staining time course in Fig. 7C. We find that, similar to ISL1, prominent DLX5 staining is seen in the focal cells at 24-hr post-BMP. Interestingly, at 48-hr, while some cells show high levels of DLX5, some cells show low DLX5 levels; this is of an interest for future investigations.

      (1) L159 - the authors should repeat more of the key results in at least one other hPSC line, to ensure reproducibility of the method. Figure S1 contains minimal information (one timepoint, three genes, one biological replicate) on a single different hPSC line. 

      We now include additional validation analysis using the H7 human ESC line (Fig. S1).

      (2) Figure 1- it is a little difficult to appreciate cyst formation from images taken at one level in the stack, can the authors perhaps show a 3D rendering or video to display morphogenesis better? 

      We now provide all optical sections of cysts shown in Movie 1.

      (3) Figure 1-did the authors carry out podocalyxin staining? This is a standard marker for lumenogenesis.  

      We now provide PODXL staining (Fig. 1A,1B).

      (4) L248 onwards and Figure 4-I am a little skeptical concerning conclusions drawn from an overlay of bulk RNA-seq onto scRNA-seq UMAP plots. I think the authors need to provide some strong justification for this approach. I would be particularly careful about concluding that cells depicted in Fig 4D represent an intermediate close to primitive streak and even more careful about claiming any lineage relationship between T-positive "primitive streak like intermediates" and the trajectory of cells in the model. UMAP is a dimension-reduction technique for the visualization of clusters in high-dimensional data. It is not a lineage-tracing methodology. It would have been preferable for the authors to present their own scRNA-seq data from the model.  

      We are sorry that it was not clear that our approach to find similarity between bulk and single cell RNA-seq data is largely based on a published work (Granja et al., Nature Biotechnology 2019, (Granja et al., 2019)) named projectLSI. Please refer to our Methods section for details of the implementation and how we modified it for better visualization (addressed in Line#667-676 of the original manuscript, now in Line#718-730). The performance of projectLSI was extensively evaluated in the original article. Furthermore, as pointed out, UMAP is indeed a dimension reduction method that has been widely used in single cell RNA-seq research. In addition to visualizing clusters, trajectory analysis, such as RNA-velocity (which is used in this study), is another successful and widely adapted application of UMAP to gauge fate progression. Therefore, we believe that UMAP can be effectively used as a lineage prediction methodology, and that our use of bulk to single cell transcriptomic similarity analysis leveraging projectLSI is well justified at conceptual and technical levels.

      As illustrated in Fig. 5A, we performed RNA-velocity analysis of the Tyser et al. dataset, and our result clearly predicts a differentiation trajectory from Epiblast, a part of the TBXTlow population shown in Fig. 4D, and, then, to Ectoderm/Amnion cells. Consistent with this bioinformatic result, we now show that some cells show some but weak TBXT expression (at the transcript level) at the 24-hr post-BMP timepoint in control hPSC-amnion (Fig. S2D, Line#264-265). Importantly, our conclusion is drawn from a trajectory based on our time course (0, 0.5, 1, 3, 6, 12, 24, and 48 hours post-BMP treatment) which shows a clear transition from epiblast cells to TBXTlow and then finally to the ectoderm/amnion population. Moreover, using the transcriptomic similarity analysis, we found that the loss of TFAP2A leads to emergence of more primitive streak-like transcriptional characteristics (Fig. 8D). Indeed, using IF, we now show that several fate spreading cells in the TFAP2A-KO cysts are TBXThigh (Fig. 8E, Line#373-374). Thus, the new data provide additional evidence for the successful implementation of this bulk/single cell transcriptomic similarity analysis.

      Together, our bioinformatic and localization analyses show that the Glass-3D+BMP system recapitulates the trajectory found in our Tyser et al. RNA-velocity analysis, further supporting the validity of this differentiation trajectory. To avoid confusion, however, we now omit the “primitive streak-like” phrase when describing the TBXTlow cells because, while they may show some TBXT expression, they are likely intermediate fate transitioning cells. Indeed, a recent study by Ton et al. (Ton et al., 2023) showed that the Tyser et al. Primitive Streak cells consist of a mix of several lineage progressing cells (e.g., Epiblast, Non-neural ectoderm, Anterior or caudal primitive streak, PGC). Therefore, these cells are now specifically described as “TBXTlow” state; TBXThigh cells are described as primitive streak-like state.

      (5) L276 Tyser data do come from a primate model; the authors mean NHP.  

      We now specifically state that the validation is performed in a non-human primate model (Line#280).

      (6) Figure 5-though the immunostaining of the CS6/7 monkey embryos is excellent, the authors should not overinterpret these images. What is shown is not a time course, and one can only infer that a particular pattern of gene expression exists in a spatial sense from these images. In the model (Figure 2), the epiblast markers gradually fade and overlap for a time with emergent amnion markers, but in Figure 5 the transition between epiblast and amnion in the embryo seems pretty sharp, at least in terms of gene expression. There may be a few cells in D that show overlap of SOX2 and TFAP2A, but if the authors want to claim that a transition zone exists, they need to produce stronger evidence. Figure 7 is more convincing but see the next point. 

      Thank you for this insightful comment. We now address the nature of the transitioning boundary cell population extensively in our other recent study (Sekulovski et al., 2023).

      (7) Figure 7 further confuses the issue. A zone at either end of the epiblast is clearly positive for Sox2 and the two amnion markers, clearer than in Figure 5, but why does the marker DLX5 overlap with SOX2 in the embryo (7d) but not the model (7C)? Arguments regarding intermediate cell populations would be greatly strengthened by scRNA-seq data on the model system. 

      In our original manuscript, our DLX5 staining was performed at 48-hr post-BMP, at which SOX2 expression is absent in all cells. Our new analysis at the 24-hr timepoint now shows that DLX5 is expressed in SOX2+ cells (this is now presented in Fig. 7C).

      As stated in the point #6, our recent study comprehensively describes the transcriptomic and spatial characteristics of the transitioning boundary cell population (Sekulovski et al., 2023).

      (8) L357 TFAP2C KO does not resemble intermediate cysts in Figure 2. In Figure 2, both SOX2 and amnion markers are co-expressed in the same cells. In 8C, SOX2 and ISL1 are mutually exclusive.  

      We agree with this comment, and now removed this statement pointing out the resemblance (Line#359 of the original manuscript).

      (9) Figure 8d-the same caveats noted above regarding the interpretation of superposition of bulk RNA-seq data with scRNA-seq UMAP analysis apply here.  

      Please refer to our explanation in point#4.

      Reviewer #3: 

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.  

      We are happy that Reviewer #3 appreciates that our model can be used effectively to identify previously unrecognized amniotic gene activation cascade, providing a comprehensive timecourse transcriptomic resource.

      As detailed below, we address specific concerns raised by Reviewer #3. We now provide additional mechanistic insights into amnion fate progression, and include additional transcriptomic comparisons with a cynomolgus macaque single cell RNA sequencing dataset.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors generated KO cell lines lacking GATA3 and TFAP2A, respectively. Their results showed some disrupted amnion differentiation only in TFAP2A-KO. Therefore, these data do not provide sufficient evidence to support whether these transcription factors are crucial for amnion fate specification. Perhaps an experiment could be done with overexpression of these markers and testing if they could force hPSC to adopt amnion-like fate.  

      Thank you for this insightful comment. We generated cell lines that enable us to inducibly express GATA3 or TFAP2A, and the transgene expression was induced at d2 (when BMP treatment is normally initiated) until d4. However, this inducible expression did not lead to amniogenesis, and cysts maintained pluripotency. Due to the uninterpretable nature, these results are not included in the revised manuscript.

      As detailed extensively in the manuscript, within each cyst, amniogenesis is initially seen focally, then spreads laterally resulting in fully squamous amnion cysts. This is also seen in our previously published Gel-3D amnion model (extensively described in (Shao et al., 2017)). In the absence of TFAP2A, we showed that the focal amniogenesis is observed, but spreading is not seen, suggesting that TFAP2A controls amnion fate progression. Therefore, while TFAP2A is not critical for the amnion fate specification in the focal cells, our results show that TFAP2A indeed helps to promote amniotic specification of cells neighboring the focal amniotic cells. Moreover, in the revised manuscript, we now show that TFAP2A transgene expression in the TFAP2A-KO background restores formation of fully squamous hPSC-amnion, further establishing the role of TFAP2A in amnion fate progression (Fig. 8C of the revised manuscript, Line#362-364).

      (2) The transcriptomic analysis made by the authors provides some comparison between BMPinduced amnion-like cells in vitro and the amnion-like cells from CS7 human embryo in vivo. However, the data set from the human embryo contains only a limited number of cells, and might not provide a sufficient base for decisive assessment of the true identity of amnion-like cells obtained in vitro. It might help if the authors could integrate their bulk sequencing data with other primate embryo data sets.  

      Thank you for this helpful comment. We have now performed our transcriptional similarity analysis using early (day 14) cynomolgus macaque embryo datasets generated in a study by (Yang et al., 2021), and found that the bulk time-course transcriptome of our hPSC-amnion model overlaps with the cynomolgus macaque amniotic lineage progression (Fig. 4F, Line#265268). We also now provide the expression of key markers within the Yang et al. dataset (GATA3, TFAP2A, ISL1, TBXT, DLX5, Fig. 4G, S2F).

      (3) Following the point above, the authors used transcriptomic analysis to identify several intermediate states of cells during amnion differentiation and claimed that there is a primitivestreak-like intermediate. However, this might be an overstatement. During stem cell culture and differentiation, intermediate states showing a mixture of biomarkers are very common and do not imply that such intermediates have any biological meaning. However, stating that amnion differentiation passes through primitive streak-like intermediates, might imply a certain connection between these two lineages, for which there is a lack of solid support. Instead, a more interesting question might be how amnion and primitive streak differentiation, despite some transcriptomic similarity, diverge from each other during early development. What factors make this difference? The authors might further analyze RNA-seq data to provide some insights.  

      Thank you very much for the insightful comments. 

      We understand Reviewer #3’s concern that the intermediate state that we see may not recapitulate a primitive streak-like state. However, in our original manuscript, we described these cells as “Primitive Streak-like” because those cells were annotated as Primitive Streak in the dataset by Tyser et al. Interestingly, a recent study by Ton et al. showed that the Tyser et al. Primitive Streak cells actually consist of a mixture of different cell lineages (e.g., Epiblast, Nonneural ectoderm, Anterior or caudal primitive streak, PGC (Ton et al., 2023)). Therefore, we agree that it was an overstatement to call them “Primitive Streak-like”, and, to avoid confusions, we now label the TBXTlow sub-population found in the Tyser et al. Primitive Streak population as “TBXTlow state” throughout the manuscript.

      Our data indicate that TFAP2A may play a role in controlling the lineage decision between amnion and primitive streak cells that abundantly express TBXT (TBXThigh). In the original manuscript, we included data showing that 48-hr TFAP2A-KO cysts show transcriptomic characteristics similar to some Primitive Streak cells (Fig. 8D). Intriguingly, our new data show that, in the absence of TFAP2A, some TBXThigh cells are indeed seen (Fig. 8E, Line#373-374). These results provide a body of evidence for the role of TFAP2A in promoting the amniotic lineage, perhaps by suppressing the TBXThigh state. This point is now addressed in the Discussion (Line#401-409).

      Additional new data:

      Using Western blot, we now show that GATA3 is absent in the GATA3-KO lines (Fig. S4C). We noticed that this was lacking in the original manuscript.

      We now show that an inducible expression of TFAP2A in the TFAP2A-KO cysts leads to controllike cysts (Fig. 8C, Line#362-364).

      Additional changes:

      Typos were fixed in Fig. 5I – “boundary” and “disseminating” were not spelled correctly.

      Line#350 – we originally noted “GATA3 expression precedes TFAP2A expression by approximately 12 hours”. This was incorrect, and is changed to 9 hours in the revised manuscript. We apologize for this mistake.

      REFERENCES

      Blakeley, P., Fogarty, N.M., del Valle, I., Wamaitha, S.E., Hu, T.X., Elder, K., Snell, P., Christie, L., Robson, P., and Niakan, K.K. (2015). Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3151-3165.

      Castillo-Venzor, A., Penfold, C.A., Morgan, M.D., Tang, W.W., Kobayashi, T., Wong, F.C., Bergmann, S., Slatery, E., Boroviak, T.E., Marioni, J.C., et al. (2023). Origin and segregation of the human germline. Life Sci Alliance 6.

      Granja, J.M., Klemm, S., McGinnis, L.M., Kathiria, A.S., Mezger, A., Corces, M.R., Parks, B., Gars, E., Liedtke, M., Zheng, G.X.Y., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature biotechnology 37, 1458-1465. Meistermann, D., Bruneau, A., Loubersac, S., Reignier, A., Firmin, J., Francois-Campion, V., Kilens, S., Lelievre, Y., Lammers, J., Feyeux, M., et al. (2021). Integrated pseudotime analysis of human pre-implantation embryo single-cell transcriptomes reveals the dynamics of lineage specification. Cell stem cell 28, 1625-1640 e1626.

      Ohgushi, M., Taniyama, N., Vandenbon, A., and Eiraku, M. (2022). Delamination of trophoblastlike syncytia from the amniotic ectodermal analogue in human primed embryonic stem cellbased differentiation model. Cell reports 39, 110973.

      Okae, H., Toh, H., Sato, T., Hiura, H., Takahashi, S., Shirane, K., Kabayama, Y., Suyama, M., Sasaki, H., and Arima, T. (2018). Derivation of Human Trophoblast Stem Cells. Cell stem cell 22, 50-63 e56.

      Petropoulos, S., Edsgard, D., Reinius, B., Deng, Q., Panula, S.P., Codeluppi, S., Plaza Reyes, A., Linnarsson, S., Sandberg, R., and Lanner, F. (2016). Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos. Cell 165, 1012-1026.

      Sasaki, K., Nakamura, T., Okamoto, I., Yabuta, Y., Iwatani, C., Tsuchiya, H., Seita, Y., Nakamura, S., Shiraki, N., Takakuwa, T., et al. (2016). The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion. Developmental cell 39, 169-185.

      Sekulovski, N., Juga, L.L., Cortez, C.L., Czerwinski, M., Whorton, A.E., Spence, J.R., Schmidt, J.K., Golos, T.G., Gumucio, D.L., Lin, C.-W., et al. (2023). Identification of amnion progenitor-like cells at the amnion-epiblast bounday in the primate peri-gastrula. bioRxiv doi:

      10.1101/2023.09.07.556553.

      Shao, Y., Taniguchi, K., Townshend, R.F., Miki, T., Gumucio, D.L., and Fu, J. (2017). A pluripotent stem cell-based model for post-implantation human amniotic sac development. Nature communications 8, 208.

      Ton, M.N., Keitley, D., Theeuwes, B., Guibentif, C., Ahnfelt-Ronne, J., Andreassen, T.K., Calero-Nieto, F.J., Imaz-Rosshandler, I., Pijuan-Sala, B., Nichols, J., et al. (2023). An atlas of rabbit development as a model for single-cell comparative genomics. Nature cell biology 25, 10611072.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S., Vallier, L., Scialdone, A., and Srinivas, S. (2021). Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285289.

      Yabe, S., Alexenko, A.P., Amita, M., Yang, Y., Schust, D.J., Sadovsky, Y., Ezashi, T., and Roberts, R.M. (2016). Comparison of syncytiotrophoblast generated from human embryonic stem cells and from term placentas. Proceedings of the National Academy of Sciences of the United States of America 113, E2598-2607.

      Yang, R., Goedel, A., Kang, Y., Si, C., Chu, C., Zheng, Y., Chen, Z., Gruber, P.J., Xiao, Y., Zhou, C., et al. (2021). Amnion signals are essential for mesoderm formation in primates. Nature communications 12, 5126.

    1. eLife assessment

      This study investigates plant-microbe interactions for an invasive plant, Ageratina adenophora. The findings are valuable in advancing our understanding of how leaf and soil microbes separately affect its performance, with solid experimental evidence revealing the importance of litter microbes in shaping A. adenophora populations. The work will be of interest to invasion biologists.

    2. Reviewer #1 (Public Review):

      Summary:

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival and seedling growth of an invasive forb, Ageratina Adenophora, and found evidence of stronger adverse effects of leaf microbes on Ageratina compared with soil microbes. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedbacks.

      Strengths:

      (1) The theoretic framework is well-established;<br /> (2) Relating the direction of plant-microbe feedback to certain microbial guild is always hard, but the authors had done a great job in identifying and interpreting such relationships.

      Weaknesses:

      (1) Allelopathic effects can't be directly accounted for;<br /> (2) The fungal strains accumulated in dead seedlings may also accumulate in live seedlings, thus more evidence is needed to validate the claim by the authors that Allophoma and Alternaria can increase seedling mortality.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival, and seedling growth of an invasive forb, Ageratina adenophora, and found evidence of stronger effects of leaf microbes on Ageratina compared with soil microbes, which were negative for seed germination and seedling survival but positive for seedling growth. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedback.

      Thank you very much for your assessment.

      Strengths:

      (1) The theoretic framework is well-established.

      (2) Relating the direction of plant-microbe feedback to certain microbial guilds is always hard, but the authors have done a great job of identifying and interpreting such relationships.

      Thank you very much for your assessment.

      Weaknesses:

      (1) In the G0 and G21 inoculation experiments, allelopathic effects from leaf litters had not been accounted for, while these two experiments happened to be the ones where negative feedback was detected.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301. 

      (2) The authors did not compare the fungal strains accumulated in dead seedlings to those accumulated in live seedlings to prove that the live seedlings indeed accumulated lower abundances of the strains that were identified to increase seedling mortality.

      Thanks for your concerns. We have not isolated fungi from healthy seedlings to make a comparative study. However, our team work previously found that the seedling-killing Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with mature A. adenophora individual; some seedling-killing Alternaria also occur in healthy seedlings inoculated by leaf litter. We thus assumed that these seedling-killing fungi, e.g., Allophoma and Alternaria, likely exist in A. adenophora mature individual by a lifestyle switch from endophytic to pathogenic, and these fungi can kill seedling only at very early life stage of A. adenophora

      Thus, we discussed this point as: “In particular, the numerically dominant Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with A. adenophora (Chen et al., 2022; Kai Fang et al., 2021; Yang et al., 2023). Interestingly, a previous report revealed that the dominant genera in healthy seedlings inoculated with leaf litter were Didymella and Alternaria (Kai Fang et al., 2019). We did not isolate fungi from healthy seedlings to determine whether the live seedlings indeed lacked or accumulated a lower abundance of the seedling-killing strains than did the dead seedlings in this study. We could assume that these fungal genera likely exist in A. adenophora mature individual experiencing a lifestyle switch from endophytic to pathogenic and play an essential role in limiting the population density of A. adenophora monocultures by killing seedlings only at very early stages. Thus, it is worth exploring the dynamic abundance of these strains and host resistance variation during A. adenophora seedling development.” in Line 432-

      444. 

      (3) The data of seed germination and seedling mortality could have been analyzed in the same manner as that of seedling growth, which makes the whole result section more coherent. I don't understand why the authors had not calculated the response index (RI) for germination/mortality rate and conducted analyses on the correlation between these RIs with microbial compositions.

      Thanks so much. Response index (RI) was calculated as:

      (variablenonsterile–variablesterile)/variablesterile)). Because mortality rates of some sterile groups were zero values, it is impossible to calculate their RIs. Relatively, only leaf microbes affect seed germination time (GT), leaf and soil microbes did not affect germination rate (GR) (see Fig. 1a,b). Therefore, we preferred to make a direct comparison of the difference between nonsterile and sterile treatments (also see Figure 1d) to assess microbial effect, and we also conducted a correlation by these values with microbial compositions rather than by RIs (see Fig. 3). We emphasized this point in the Materials and Methods in our resubmitted revision as: “Because the mortality rates of some sterile groups were zero and their RIs were impossible to calculate, we had to directly compare the seedling mortality caused by nonsterile with by sterile samples and perform the analysis of correlation between the mortality rate and microbial composition.” in Line 565-568. 

      (4) The language of the manuscript could be improved to increase clarity.

      We have improved language in the resubmitted version.

      Reviewer #2 (Public Review):

      Summary: 

      The study provides strong evidence that leaf microbes mediate self-limitation at an early life stage. It highlights the importance of leaf microbes in population establishment and community dynamics. 

      Thank you very much for your assessment.

      The authors conducted three experiments to test their hypothesis, elucidating the effects of leaf and soil microbial communities on the seedling growth of A. adenophora at different stages, screening potential microbial sources associated with seed germination and seedling performance, and identifying the fungus related to seedling mortality. The conclusions are justified by their results. Overall, the paper is wellstructured, providing clear and comprehensive information.

      Thank you very much for your assessment.

      Reviewing Editor (Recommendations For The Authors):

      In addition to the assessments from the reviewers, we have the following comments on your paper:

      (1) The experimental design is complicated with regard to the multiple interacting treatments. The statistical analyses show that the interaction terms are important and significant. In this case, it could be more informative to show the detailed results at the sub-level than at the main level in the main text. For example, the main effects of inoculation sources and nutrients shown in Figure 2 are difficult to interpret, because the effects of inoculation sources and nutrients have important dependencies with each other and other factors such as inoculation time as shown in Figure S3. Therefore, Figure S3 is more informative than Figure 2. Please also be cautious that it would be necessary to clarify this context dependence when showing and citing results of the main effect to avoid any possible misunderstanding, such as the case of Figure 2 and S3.

      Thanks for your suggestion. We have deleted Figure 2 and placed Figure S3 in the text as Figure 2. And corresponding results have rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (2) Response index (RI) is already a measure of microbial feedback effect, so that feedback may not be necessary as an explanatory variable in the model with RI as the response variable.

      We are sorry that our writing misunderstood you. Here the word “feedback” (e.g., foliage- or soil feedback) does not represent microbial feedback effect, it means leaf or soil inoculation. We have replaced “feedback” by “inoculation source” in the figures and text for better understanding.

      (3) Mortality rate is a ratio. It is unclear whether assuming a Gaussian error distribution is fine in your case. It would be important to check the residual distribution and to see whether data transformation (e.g., log) or using other error assumptions (e.g., binomial) is necessary.

      Thanks for your suggestion. As you say, it is not appropriate to use generalized linear models (GLMs) with Gaussian error distributions (identity link) to evaluate seedling mortality, because mortality rate is a ratio, which do not meet normality. Thus, we deleted the result of GLM of seedling mortality and directly compared seedling mortality between different microbial treatments, inoculation time, nutrition level and inoculation source by Mann–Whitney U test and Kruskal–Wallis test (see Fig.1 d). All corresponding results have also been rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (4) Please be consistent about the wording of different treatment names throughout the texts, tables, and figures. For example, "feedback" should only be used for microbial treatment, but not for inoculation source treatment (e.g., Figure 2). We can say there is an effect of microbial feedback only if we compare sterile vs. non-sterile groups, otherwise, there could be other effects, for example, the allelopathic effect pointed out by Reviewer #1. When writing inoculation, please be specific about whether it is for inoculation time or inoculation source (e.g., within multiple statistical tables in the appendix).

      Thanks for your good suggestion. We have changed “different feedback” into “different inoculation source” for better understanding our story.

      (5) Please clarify which inoculation periods they are for Figures 1d-g.

      Thanks for your good suggestion. We have added inoculation periods in Fig.1.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:

      Lines 12-15: This sentence is too long and complicated, making it unclear what had been done and what had not in previous studies.

      Thanks a lot. We have reorganized this sentence as: “However, how the phyllosphere and rhizosphere soil microbes distinctively affect seedling mortality and the growth of invasive plants across ontogeny under varying soil nutrient levels remains unclear.”.

      Line 19: is it appropriate to use "enrich" here?

      Thanks. We have changed “Microbial inoculation at different growth stages altered the microbial community and functions enriched in seedlings” into “Microbial inoculation at different growth stages altered the microbial community and functions of seedlings”.

      Line 24-25: "litter exhibited phylogenetic signals"? not clear what this means.

      Thanks. Significant phylogenetic signals represent the seedling-killing effects of fungal strains on A. adenophora were related to phylogenetic relatedness of these strains. So, we have changed “fungal strains isolated from dead seedlings inoculated with litter exhibited significant phylogenetic signals to seedling mortality” into “the A. adenophora seedling-killing effects of fungal strains isolated from dead seedlings by non-sterile leaf inoculation exhibited significant phylogenetic signals, by which strains of Allophoma and Alternaria generally caused high seedling mortality.”

      Line 29: using "in turn" in the first sentence seems weird.

      We deleted this.

      Lines 32-33: PSFs are usually positive because of?

      We have changed “PSFs have positive effects by escaping soil pathogens and recruiting some beneficial microbes” into “PSFs are usually positive because of escaping soil pathogens and recruiting some beneficial microbes”.

      Line 54: why emphasize "a single soil microbe"?

      Although the research of Geisen et al., (2021) assessed the effect of each strain of 34 isolates on seed germination and plant growth, Jevon et al., (2020) focused on the soil microbial community on seedling and adult plants survival. Thus, we changed “a single soil microbe” into “soil microbes”.

      Lines 85-86: "tested their mortality to seedlings"? not clear what this means.

      We are so sorry that our writing misunderstood you. We have changed “we also isolated the fungi associated with the dead seedlings and tested their mortality to seedlings.” into “we also isolated the fungi associated with the dead seedlings and tested their seedling-killing effects on A. adenophora.”.

      Results: no statistics and no references for the statistical tables that could support the results were presented in this section.

      We have deleted the inappropriate generalized linear models (GLMs) with Gaussian error distributions (identity link) for evaluating seedling mortality, and all corresponding results have also described (see Line 109-115 and Fig. 1d).

      Lines 100-102: this subtitle reads more like a summary of the following results than a title. All subtitles in the Result section have similar issues (i.e. Lines 148-150, 207-209).

      Thanks, we subdivided our Results into four sections and we changed these subtitles as:” Effects of leaf litter and rhizosphere soil on the mortality and growth of A. adenophora seedlings”, “Correlations of microbial community composition and potential function with seedling mortality at the early stage”, “Enrichment of microbial community and function by A. adenophora seedlings under different treatments”, and “Correlations of the enriched microbial community and function with A. adenophora seedling growth”.  

      Lines 148-206: since there are a lot of results concerning the microbial composition, I suggest focusing on those that could directly explain the positive or negative feedback. The one concerning diversity (e.g. Figure 3 and corresponding texts) does not seem necessary.

      Thanks for your suggestion. We have moved figure 3 into the supplementary figures as Figure S2. To focus on core microbes that could directly explain the positive or negative feedback, we reordered Figure 3, where firstly showed the core soil and leaf bacteria, bacterial functions, as well as core soil and leaf fungi, fungal function (Fig3 a-h); and then showed the correlations of top 30 bacterial and fungal genera from soil and leaf with seedling mortality rate (Fig3 i-j). 

      Line 180: is it not common sense that ectomycorrhiza can only be found in soil?

      Yeah, it is. We have deleted this sentence.

      Line 199: "the seedling mortality of these strains"? not clear what this means,

      We have changed “The seedling mortality of these strains” into “The seedling-killing of these strains on A. adenophora”.

      Line 291-292: I don't see how the authors can distinguish between allelopathic and pathogenic effects based on their results.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301.

      Lines 383-414: Correlations are not necessarily causations. Sometimes a strong correlation may result from higher-order interaction. The authors should be more cautious about the discussion of microbial function in this section.

      Thanks. We deleted all descriptions of adverse effect or beneficial effect on host plant A. adenophora growth and cautiously used “negative correlation or positive correlation” to discuss the functions of these enriched microbes by A. adenophora. In the last, we also added a sentence to say: “It is necessary to isolate these enriched microbes to test the interactions with the early life stage of A. adeonophora.”

      (see Line 411-413).

      Lines 489-490: I don't really understand why the authors performed a combination treatment. What did they expect from such a combination?

      Thanks. We described our consideration as: “Leaf inoculation at G28 was performed to simulate natural microbial spread from the leaf litter to the above part of the seedlings by suspending the leaf bag over the transplanted seedlings without direct contact all the time (see Zaret et al. (2021)). This method may result in only microbial species with easy air transmission to infect seedlings. Thus, an additional combination inoculation (named G21+28) was performed on both the 21st (with seedling contact) and 28th days (without seedling contact) to ensure that most leaf microbes had the opportunity to reach the seedlings.” see Line 498-505.

      Figure 1: why not use "mortality rate" instead of "death rate"?

      Thanks. We have changed “death rate” into “mortality rate” in all corresponding figures and text.

      Figure 8: This is a very complicated experimental setup. Why did the authors harvest the plants treated with nutrient addition after the 12th day of the experiment and harvest those without nutrient addition after the 16th day? Why the time lag?

      Thanks. We explained this as: “Seedlings were harvested after 8 weeks of growth under high-nutrient conditions because they grew too fast and touched the PTFE cover; however, we harvested those plants grown under low-nutritional conditions after another 4 weeks of growth due to their very small size (see Fig. S6).”

      (see Method in Line 514-517).

    1. Reviewer #1 (Public Review):

      Summary:

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells.

      Strengths:

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC.

      Weaknesses:

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses. Additionally, methods and statistical analyses require further clarification.

      Main comments:

      (1) Novelty<br /> The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234).

      (2) Mouse data<br /> a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneths cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis.<br /> b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was strikingly low (17 cells, Figure 3), and assuming that these cells are driving the differences seems somewhat far-fetched. Adding to this concern is inappropriate gating in the flow plot shown in Figure 6. This should be addressed experimentally and in the interpretation of data.<br /> c) Figures 3, 5, and 6 show the individual gene analyses with unclear statistical data. It seems that the p-values were not adjusted, and it is unclear how they reached significance in several graphs. Additionally, it was not stated how many animals per group and cells per animal/group were included in the analyses.<br /> d) Figure 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis.<br /> e) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.?

      (3) Human data<br /> Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage?

      (4) Bioinformatic analysis<br /> The authors did not provide sufficient information on bioinformatic analyses. The authors did not include information about the software, cutoffs, or scripts used to make their analyses or output those figures in the manuscript, which challenges the interpretation and assessment of the results. Terms like "Quantitative gene expression analyses" (line 136) "visualized in a Uniform Approximation and Projection" (line 178) do not explain what was inputted and the analyses that were executed. There are multiple forms to align, preprocess, and visualize bulk, single cell, ATAC, and ChIP-seq data, and depending on which was used, the results vary greatly. For example, in the single-cell data, the authors did not inform how many cells were sequenced, nor how many cells had after alignment and quality filtering (RNA count, mt count, etc.), so the result on Paneth+ to Goblet+ percent in lines 184 and 185 cannot be reached because it depends on this information. The absence of a clustering cutoff for the single-cell data is concerning since this greatly affects the resulting cluster number (https://www.nature.com/articles/s41592-023-01933-9). The authors should provide a comprehensive explanation of all the data analyses and the steps used to obtain those results.

      (5) Clarity of methods and experimental approaches<br /> The methods were incomplete and they require clarification.

    2. eLife assessment

      This valuable study proposes that protein secreted by colon cancer cells induces cells with Paneth-like properties that favor colon cancer metastasis. The evidence supporting the conclusions is incomplete and would benefit from more direct experiments to test the functional role of Paneth-like cells and to monitor metastasis from colon tumors. The work will be of interest to researchers studying colon cancer metastasis.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9.

      Strengths:

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context.

      Weaknesses:

      In this work by Shin et al., the authors expand upon prior work regarding the role of Dickkopf-2 in colorectal cancer (CRC) progression and the necessity of a Paneth-like population in driving CRC metastasis. The general topic of metastatic requirements for colon cancer is of general interest. However, much of the work focuses on characterizing cell populations in a mouse model of hepatic outgrowth via splenic transplantation. In particular, the concept of Paneth-like cells is primarily based on transcriptional programs seen in single-cell RNA sequencing data and needs more validation. Although including human samples is important for potential generality, the strength could be improved by doing immunohistochemistry in primary and metastatic lesions for Lyz+ cancer cells. Experiments that further bolster the causal role of Paneth-like CRC cells in metastasis are needed.

    1. eLife assessment

      Through a genome-wide screen for functional alternative transcription start sites (TSS) in Arabidopsis, the authors provide evidence for widespread transcription of potential microproteins from previously annotated protein-coding genes. Functional analysis of AtHB2-miP, derived from the C-terminal region of transcription factor AtHB2 and predicted to form non-productive dimers with ATHB2, suggested that this microprotein could affect AtHB2 functions in shade responses, root growth, and iron homeostasis. The work is valuable as a case study of how new microproteins could act to modulate gene regulation in response to environmental change, but the focus on a single gene, the lack of precision in AtHB2-miP measurement and missing controls, and the relatively minor phenotypic effects mean that data supporting microprotein production as a vital regulatory strategy are incomplete.

    1. eLife assessment

      This valuable study reports a novel function of ATG14 in preventing pyroptosis and inflammation in oviduct cells, thus allowing smooth transport of the early embryo to the uterus and implantation. However, the data supporting the main conclusion remain incomplete. This work will be of interest to reproductive biologists and physicians practicing reproductive medicine.

    2. Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+;Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles of ATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 is knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide the verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers to better understand your experiments. Alternatively, you can add it in the introduction.

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using PrCre and also Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provides partial evidence about a new mechanism linking Atg14 to pyropotosis.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so does that contribute to the observation?

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Difference in membrane potential etc. for a definitive conclusion.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

    5. Author response:

      Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+; Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      We thank the reviewer for insightful comments and helpful suggestions. We will address majority of the concerns. Specifically, we will evaluate whether loss of Atg14 leads pyroptosis in other reproductive tract tissue, uterus, and ovary. To determine the ATG14 spatiotemporal expression, we will assess the ATG14 expression in oviducts of WT, and cKO mouse models. Further, to understand the impact of Atg14 loss on different regions of oviduct, we would provide additional images from cKO mice and will quantify FOXJ1 positive cells. To address the concerns on cyclicity and steroid hormone levels, we will measure the E2 or P4 levels and assess E2-target genes in uterus from control and cKO mice. We will also include the ampullary section images from the oviducts of Atg14 cKO and control females.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles ofATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      We agree with the reviewer on this, we will take a cautious approach and will modify the statements that ATG14 dependent autophagy might be critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport.

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 has knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      To address Atg14 action in oviduct, we will perform ATG14 IHC staining in the oviduct and also evaluate the GSDMD expression in uteri and ovary, wherein PR-cre expression is active. Further, we will provide literature-based evidence for PR-cre expression in the oviduct, which is well-established. However, generating a secretory Pax-8 cell cre mice model will require a substantial amount of time and effort and we respectfully argue that this is currently out of the scope of this manuscript.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide the verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      We will perform expression analysis for ATG14 in Foxj1/Atg14 cKO mice to determine the effective ablation in cilia.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in the control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      We will correct the labels and include high-magnification images to explain the morphological differences in HESC cells..

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      Dr. Ramya Masand, Chief Pathologist in our department and a contributing author, critically evaluated the stained sections from Figure 3 and provided the pathological assessment as outlined in lines 332-336. We will consult Dr. Masand and will modify the statements accordingly.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      We will calculate the average number of embryos from the oviduct and uterus and provide numerical data.

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      We will perform the GSDMD staining to determine whether Polyphyllin VI treatment resulted in oviductal pyroptosis activation and lumen disruption.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers better understand your experiments. Alternatively, you can add it in the introduction.

      We will include more literature-based discussion on pyroptosis and its mechanism.

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      We will provide definitions for the acronyms such as FRT, HESCs, and GSDMD.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

      We will correct the statement.

      A few of the additional comments for authors to consider improving the manuscript are listed below.

      (1) Some of the figures are missing scale bars, while others have inconsistent scale bars. It would be better to be consistent.

      (2) On a couple of occasions, the DAPI signal cannot be seen, such as in Figure 2B and Figure 3D.

      (3) Overall, the figure legends can be improved to provide more detailed information to help the reader to interpret the data.

      As suggested, we will include the scale bars with high quality images and will elaborate the figure legends text.

      (4) In Figure 2D, the Y-axis showed the stimulated/unstimulated uterine weight ratio, why did the author put "Atg14" at the top of the graph? At the same time, the X-axis title is missing in Figure 2D.

      (5) In the left panel of Figure 2G, "ATG14" at the top should be "Atg14" to be consistent.

      (6) In line 559, there miss "(A)" in front of Immunofluorescence analysis of GSDMD.

      We will make these necessary changes.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using Pr Cre and Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provides spatial evidence about a new mechanism linking Atg14 to pyroptosis.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      We will carry out GSDMD staining in uterine tissues and discuss the findings.

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so, does that contribute to the observation?

      We will assess the expression of autophagy-related markers in response to Atg14 loss and will discuss the findings. 

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      We agree with reviewer on cellular plasticity aspect, we will remove the plasticity word, instead will mention only integrity.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Diff erence in membrane potential etc. for a definitive conclusion.

      We will perform additional mitochondrial staining to determine the mitochondrial morphology in both the oviduct and uterus. Based on the results, we would consider measuring the Feret's diameters. However, we respectfully argue that performing complex membrane potential studies will take time and are beyond the scope of current focus.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      As shown in Figure 3E, staining the oviduct epithelia with KRT8 clearly showed a disorganized oviduct with abnormally fused cells leaving no lumen space.  We could provide higher magnification images in supplementary figures to highlight this observation.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

      Autophagy has been shown to inhibit pyroptosis by either inhibiting the cleavage of GSDMD or by suppressing various pyroptosis-related factors, including NFLRs and STING proteins. We found that the loss of Atg14 results in elevated GSDMD levels, a potential mechanism through which Atg14 suppresses pyroptosis in the oviduct. Importantly, Atg14 may regulate GSDMD through several intermediary factors, and resolving this intricate nexus necessitates conducting complex biochemical, cellular, and molecular screenings, which is one of the focus of our future investigations.

    1. eLife assessment

      This study is a computational analysis using publicly available deep sequencing datasets and the findings support the models that propose widespread gene transfer amongst DNA viruses. The evidence supporting the claims of the authors is solid, but reproducing the analysis based only on the information as presented in the Materials and Methods would be difficult as the data are currently presented. A Flow chart that details the process would help. This is an almost entirely computational study without experimental evidence but one that has the potential to become a fundamental resource for virus hunters - an activity of increasing importance.

    2. Reviewer #1 (Public Review):

      This paper discusses the identification of viral genes in publicly available DNA and RNA sequencing datasets. In many cases, these datasets have been assembled into contigs. Many viral genes were identified and contigs containing genes from more than one type of virus were more common than expected. The analysis appears to be sound and the results presented will be of great interest to the community.

      The strengths of the paper are in the analysis itself, which is detailed, complex, and on a very large scale. To my knowledge, the identification of DNA viral proteins in sequencing datasets not deliberately infected with viruses has not previously been performed on this scale. Many proteins were identified which are at the limit of our current capacity to detect divergent proteins. I think the use of multiple methodologies strengthens the study, as it increases the depth of the results. The authors are also clear about the limitations of their study and give many caveats about their results, which is excellent.

      I have two major concerns about the study. The first is the presentation, which in places makes it difficult to tell exactly how and why the analysis has been performed. I do not think it would be possible to reproduce this analysis based only on the information presented in the Materials and Methods section. This makes it difficult to assess the exact details of the method and whether they are appropriate. I would appreciate something like a flow chart to show, for each SRA dataset and each assembled contig, the exact steps taken for classification and the hierarchy of tools, plus the threshold values, applied to the results. An overview of the results at the beginning of the results section would also be helpful - how many proteins were identified, what were their host species, how many contigs were assembled and how many of these were chimeric, etc.

      My second concern is that it is not clear how each protein was determined to be either viral or non-viral or how contigs were assigned as chimeric or non-chimeric. Positive and negative controls are not mentioned and false positive or negative rates are not calculated. Given that many of the identified proteins are highly divergent from known viral proteins, it would be good to see how likely it is that a random protein would be assigned as viral, or a viral protein as non-viral. Chimeric contigs could occur due to misassembly or endogenous viral elements, it seems like viruses in these categories may have been filtered using Cenote Taker but no checks are described to confirm that the filtering was successful.

      Overall, I think that the study is useful and of interest, but I think more clarity in the presentation of the results would increase the value of the paper for many readers.

    3. Reviewer #2 (Public Review):

      Summary:

      A large-scale computational analysis of published sequences of various animal species provides evidence for extensive gene transfer amongst DNA viruses.

      Strengths:

      The study provides evidence for a large number of previously uncharacterized DNA viruses and supports a model whereby DNA viruses have evolved by combining distinct shared replication modules and some of these evolutionary oddities likely remain in the biosphere. The work provides a useful repository and potential framework for additional virus discovery efforts.

      Weaknesses:

      This is an entirely computational story, with very limited experimental validation. A large number of often confusing new acronyms are introduced that may be "cute" (such as the reference to the delicious half-smoke sausage) but are not particularly useful. This is not helped by the somewhat "telegraphic" presentation of the data that is sometimes difficult to digest. Not all paragraphs deliver what they promise. For example under the title "Polyomaviruses and papillomaviruses" there is no discussion of papillomaviruses. Overall, however, these weaknesses do not diminish my enthusiasm for this paper, which will be an important resource for computational and non-computational virus hunters.

    4. Reviewer #3 (Public Review):

      Summary:

      Buck et al., set out to characterize small DNA tumor viruses through the generation and analysis of ~100,000 public sequencing datasets from the SRA and other databases. Using a variety of powerful bioinformatic methods including alignment-based searches, statistical modelling, and structure-aware detection, the authors successfully classify novel protein sequences which support the occurrence of evolutionary gene transfer between DNA virus families. The authors propose a naming scheme to better capture viral diversity and uncover novel chimeric viruses, those containing genes from multiple established virus families. Additional analysis using the generated dataset was performed to search for DNA and RNA viruses of interest, demonstrating the utility of generated datasets for exploratory screens. The assembled sequencing datasets are publicly available, providing invaluable resources for current and future investigations within this subfield.

      Strengths:

      The scope of data analysis (100,000+ SRA records and additional libraries) is substantial, and the authors have contributed to further insight into the modularity of previously uncharacterized viral genomes, through computationally demanding advanced bioinformatics analyses in addition to extensive manual inspection.

      The publicly available resources generated as a result of these analyses provide useful data for further experiments to inspect viral diversity and modularity. Other scanning experiments and further investigation of biologically relevant viruses using these contigs may uncover, for example, animal reservoirs or novel recombinant viruses of significance.

      Novel instances of genomic modularity provide excellent starting points for understanding virus evolutionary pathways and gene transfer events.

      Weaknesses:

      Overall, the methods section of this paper requires more detail.

      The inclusion criteria for which "SRA" datasets were or were not utilized within this study are poorly defined. This means the comprehensiveness of the study for a given search space of the SRA is not defined, and the results are ultimately not reproducible, or expandable. For example, are all vertebrate RNA-seq samples processed? Or just aquatic vertebrate RNA-seq? Were samples randomly sampled from a more comprehensive data set? What is the make-up of the search space and how much was DNA-seq or RNA-seq? This section should be expanded and explicit accounting provided for how dataset selection was performed. This would provide additional confidence in the results and conclusions, as well as allow for future analysis to be conducted.

      Hallmark virus genes require further clarification, as it is unclear what genes are utilized as bait, or in the initial search process. The reported "Hallmark gene sets" are not described in a systematic way. What is the sensitivity and specificity of these gene sets? Was there a validation of the performance characteristics (ROC) for this gene set with different tools? How is this expected to be utilized? Which kinds of viruses are excluded/missed? Are viroids included?

      For the Tailtomavirus, additional information is needed for sufficient confidence. Was this "chimeric" genomic arrangement detected in a single library? This raises a greater issue of how technical artifacts, which may appear as chimeric assemblies, are ruled out in the workflow. If two viral genomes share a k-mer of length greater than the assembly k, the graph may become merged. Are there read pairs that span all regions of the genome? Is there evidence for multiple homologous viruses with synteny between them that supports the combination of these genes as an evolving genome, or is this an anomalous observation? Read alignments should be included and Bandage graph visualization for all cases of chimeric assemblies and active steps to disprove the baseline hypotheses that these are technical artifacts of genome assembly.

      Justification for exclusion of endogenized sequences is not included and must be described, as small DNA tumor viruses may endogenize into the host genome as part of their life cycle. How is such an integration resolved from an evolutionary "endogenization"? What's the biological justification for this step?

      Additional supporting information, clear presentation, and context are needed to strengthen results and conclusions.

      Basic reporting of global statistics, such as the total number of viruses found per family, should be included in the main text to better support the scope of the results. How many viruses (per family) were previously known, and therefore what is the magnitude of the expansion performed here?

      Additional parameters and information should be included in bioinformatic tool outputs to provide greater clarity and interpretation of results. For example, reporting the "BLASTp E-val", as for the PolB homology (BLASTp 6E-12) is not informative, and does not tell the reader this is (we assume) an expectancy value. For each such case please report, the top database hit accession, percent identity, query coverage, and E-value. Otherwise, a judgment cannot be adequately made regarding the quality of evidence for homology. Similarly, for HHpred what does the number represent - confidence, identity, or coverage?

      Some findings described in the Results section may require revision. Several of the Nidoviruses (Nidovirus takifugu, Nidovirus hypomesus, Nidovirus ambystoma, etc...) have been previously described by three groups, first by Edgar et al., (https://www.nature.com/articles/s41586-021-04332-2), then Miller et al., (https://academic.oup.com/ve/article/7/2/veab050/6290018) and then Lauber et al., (https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1012163). This is now the 4th description of the same set of viruses. These sequences are in GenBank (https://www.ncbi.nlm.nih.gov/nuccore/OV442424.1), although it is unclear why they're not returned as BLAST hits. Miller also described the Togavirus co-segment previously.

      It is also uncertain what is being described with HelPol/maldviruses which was not previously described in distantly similar relatives. How many were described in the previous literature and how many are described by this work?

      Co-phylogenies should be used to convey gene transfer and flow clearly to support the conclusions made in the text.

      Statements such as, "The group encompasses a surprising degree of genomic diversity...", should be supported by additional information to strengthen conclusions (e.g., what the expected diversity is). What is the measurement for genomic diversity here, and why is this surprising? There is overall a lack of quantification to support the conclusions made throughout the paper.

    1. eLife assessment

      This study investigates the role of queuosine (Q) tRNA modification in aminoglycoside tolerance in Vibrio cholerae and presents convincing evidence to conclude that Q is essential for the efficient translation of TAT codons, although this depends on the context. The absence of Q reduces aminoglycoside tolerance potentially by reprogramming the translation of an oxidative stress response gene, rxsA. Overall, the findings point to an important mechanism whereby changes in Q modification levels control the decoding of mRNAs enriched in TAT codons under antibiotic stress.

    2. Reviewer #1 (Public Review):

      Summary of the work: In this work, Fruchard et. al. study the enzyme Tgt and how it modifies guanine in tRNAs to queuosine (Q), essential for Vibrio cholerae's growth under aminoglycoside stress. Q's role in codon decoding efficiency and its proteomic effects during antibiotic exposure is examined, revealing Q modification impacts tyrosine codon decoding and influences RsxA translation, affecting the SoxR oxidative stress response. The research proposes Q modification's regulation under environmental cues reprograms the translation of genes with tyrosine codon bias, including DNA repair factors, crucial for bacterial antibiotic response.

      The experiments are well-designed and conducted and the conclusions, for the most part, are well supported by the data. However, a few clarifications will significantly strengthen the manuscript.

      Major:<br /> Figure S4 A-D. These growth curves are important data and should be presented in the main figures. Moreover, given that it is not possible to make a rsxA mutant, I wonder if it would be possible to connect rsx and tgt using the following experiment: expression of tgt results in resistance to TOB (in B), while expression of only rsx lower resistance to TOB (in D). Then simultaneous overexpression of both tgt/rsx in the WT strain should have either no effect on TOB resistance or increased resistance, relative to the WT. Perhaps the authors have done this, and if so, the data should be included as it will significantly strengthen their model.

      Figure S4 - Is there a rationale for why it is possible to make rsx mutants in E. coli, but not in V. cholerae? For example, does E. coli have a second gene/protein that is redundant in function to rsxA, while V. cholerae does not? I think your data hint at this, since in the right panel growth data, your double mutant does not fully rescue back to rsx single mutant levels, suggesting another factor in tgt mutant also acts to lower resistance to TOB. If so, perhaps a line or two in text will be helpful for readers.

      -For growth curves in Figure 2 and relative comparisons like in Figure 5D and Figure S4 (and others in the paper), statistics and error bars, along with replicate information should be provided.

      -Figure 6A - Is the transcript fold change in linear or log? If linear, then tgt expression should not be classified as being upregulated in TOB. It is barely up by ~2-fold with TOB- 0.6....which is a mild phenotype, at best.

      -Line 779- 780: "This indicates that sub-MIC TOB possibly induces tgt expression through the stringent response activation." To me, the data presented in this figure, do not support this statement. The experiment is indirect.

      -Figure 3B and D. - These samples only have tobramycin, correct? The legend says both carbenicillin and tobramycin.

      -Figure 5. The color schemes in bars do not match up with the color scheme in cartoons below panels B and C. That makes it confusing to read. Please fix.

      -A lot of abbreviations have been used. This makes reading a bit cumbersome. Ideally, less abbreviations will be used.

    3. Reviewer #2 (Public Review):

      Fruchard et al. investigate the role of the queuosine (Q) modification of the tRNA (Q-tRNA) in the human pathogen Vibrio cholerae. First, the authors state that the absence of Q-modified tRNAs (tgt mutant) increases the translation of TAT codons and proteins with a high TAT codon bias. Second, the absence of Q increases rsxA translation, because rsxA gene has a high TAT codon bias. Third, increased RsxA in the absence of Q inhibits SoxR response, reducing resistance towards the antibiotic tobramycin (TOB). Authors also predict in silico which genes harbor a higher TAT bias and found that among them are some involved in DNA repair, experimentally observing that a tgt mutant is more resistant to UV than the wt strain. It is worth noting that authors employ a wide variety of techniques, both experimental and bioinformatic. However, some aspects of the work need to be clarified or reevaluated.

      (1) The statement that the absence of Q increases the translation of TAT codons and proteins encoded by TAT-enriched genes presents the following problems that should be addressed:

      (1.1) The increase in TAT codon translation in the absence of Q is not supported by proteomics, since there was no detected statistical difference for TAT codon usage in proteins differentially expressed. Furthermore, there are some problems regarding the statistics of proteomics. Some proteins shown in Table S1 have adjusted p-values higher than their p-values, which makes no sense. Maybe there is a mistake in the adjusted p-value calculation. In addition, it is not common to assume that proteins that are quantitatively present in one condition and absent in another are differentially abundant proteins. Proteomics data software typically addresses this issue and applies some corrections. It would be advisable to review that.

      (1.2) Problems with the interpretation of Ribo-seq data (Figure 4D). On the one hand, the Ribo-seq data should be corrected (normalized) with the RNA-seq data in each of the conditions to obtain ribosome profiling data, since some genes could have more transcription in some of the conditions studied. In other articles in which this technique is used (such as in Tuorto et al., EMBO J. 2018; doi: 10.15252/embj.201899777), it is interpreted that those positions in which the ribosome moves most slowly and therefore less efficiently translated), are the most abundant. Assuming this interpretation, according to the hypothesis proposed in this work, the fragments enriched in TAT codons should have been less abundant in the absence of Q-tRNA (tgt mutant) in the Rib-seq experiment. However, what is observed is that TAT-enriched fragments are more abundant in the tgt mutant, and yet the Ribo-seq results are interpreted as RNA-seq, stating that this is because the genes corresponding to those sequences have greater expression in the absence of Q. On the other hand, it would be interesting to calculate the mean of the protein levels encoded by the transcripts with high and low ribosome profiling data.

      (1.3) This statement is contrary to most previously reported studies on this topic in eukaryotes and bacteria, in which ribosome profiling experiments, among others, indicate that translation of TAT codons is slower (or unaffected) than translation of the TAC codons, and the same phenomenon is observed for the rest of the NAC/T codons. This is completely opposed to the results showed in Figure 4. However, the results of these studies are either not mentioned or not discussed in this work. Some examples of articles that should be discussed in this work:<br /> - "Queuosine-modified tRNAs confer nutritional control of protein translation" (Tuorto et al., 2018; 10.15252/embj.201899777)<br /> - "Preferential import of queuosine-modified tRNAs into Trypanosoma brucei mitochondrion is critical for organellar protein synthesis" (Kulkarni et al., 2021; doi:10.1093/nar/gkab567.<br /> - "Queuosine-tRNA promotes sex-dependent learning and memory formation by maintaining codon-biased translation elongation speed" (Cirzi et al., 2023; 10.15252/embj.2022112507)<br /> - "Glycosylated queuosines in tRNAs optimize translational rate and post-embryonic growth" (Zhao et al., 2023; 10.1016/j.cell.2023.10.026)<br /> - "tRNA queuosine modification is involved in biofilm formation and virulence in bacteria" (Diaz-Rullo and Gonzalez-Pastor, 2023; doi: 10.1093/nar/gkad667). In this work, the authors indicate that Q-tRNA increases NAT codon translation in most bacterial species. Could the regulation of TAT codon-enriched proteins by Q-tRNAs in V. cholerae an exception? In addition, authors use a bioinformatic method to identify genes enriched in NAT codons similar to the one used in this work, and to find in which biological process are involved the genes whose expression is affected by Q-tRNAs (as discussed for the phenotype of UV resistance). It will be worth discussing all of this.

      (1.4) It is proposed that the stress produced by the TOB antibiotic causes greater translation of genes enriched in TAT codons. On the one hand, it is shown that the GFP-TAT version (gene enriched in TAT codons) and the RsxA-TAT-GFP protein (native gene naturally enriched in TAT) are expressed more, compared to their versions enriched in TAC in a tgt mutant than in a wt, in the presence of TBO (Fig. 5C). However, in the absence of TOB, and in a wt context, although the two versions of GFP have a similar expression level (Fig. 3SD), the same does not occur with RsxA, whose RsxA-TAT form (the native one) is expressed significantly more than the RsxA-TAC version (Fig. 3SA). How can it be explained that in a wt context, in which there are also tRNA Q-modification, a gene naturally enriched in TAT is translated better than the same gene enriched in TAC? It would be expected that in the presence of Q-tRNAs the two versions would be translated equally (as happens with GFP) or even the TAT version would be less translated. On the other hand, in the presence of TOB the fluorescence of WT GFP(TAT) is higher than the fluorescence of WT GFP(TAC) (Figure S3E) (mean fluorescence data for RsxA-GFP version in the presence of TOB is not shown). These results may indicate that the apparent better translation of TAT versions could be due to indirect effects rather from TAT codon translation.

      (2) Another problem is related to the already known role of Q in prevention of stop codon readthrough, which is not discuss at all in the work. In the absence of Q, stop codon readthrough is increased. In addition, it is known that aminoglycosides (such as tobramycin) also increase stop codon readthrough ("Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides"; Wanger and Green, 2023; 10.7554/eLife.52611). Absence of Q and presence of aminoglycosides can be synergic, producing devastating increases in stop codon readthrough and a large alteration of global gene expression. All of these needs to be discussed in the work. Moreover, it is known that stop codon readthrough can alter gene expression and mRNA sequence context all influence the likelihood of stop codon readthrough. Thus, this process could also affect to the expression of recoded GFP and RsxA versions.

      (3) The statement about that the TOB resistance depends on RsxA translation, which is related to the presence of Q, also presents some problems:

      (3.1) It is observed that the absence of tgt produces a growth defect in V. cholerae when exposed to TOB (Figure 1A), and it is stated that this is mediated by an increase in the translation of RsxA, because its gene is TAT enriched. However, in Figure S4F, it is shown that the same phenotype is observed in E. coli, but its rsxA gene is not enriched in TAT codons. Therefore, the growth defect observed in the tgt mutant in the presence of TOB may not be due to the increase in the translation of TAT codons of the rsxA gene in the absence of Q. This phenotype is very interesting, but it may be related to another molecular process regulated by Q. Maybe the role of Q in preventing stop codon readthrough is important in this process, reducing cellular stress in the presence of TOB and growing better.

      (3.2) All experiments related to the effect of Q on the translation of TAT codons have been performed with the tgt mutant strain. Considering that the authors have a pSEVA-tgt plasmid to overexpress this gene, they would have to show whether tgt overexpression in a wt strain produces a decrease in the translation of proteins encoded by TAT-enriched genes such as RsxA. This experiment would allow them to conclude that Q reduces RsxA levels, increasing resistance to TOB.

      (3.3) On the other hand, Fig. 1B shows that when the wt and tgt strains compete, both overexpressing tgt, the tgt mutant strain grows better in the presence of TOB. This result is not very well understood, since according to the hypothesis proposed, the absence of modification by Q of the tRNA would increase the translation of genes enriched in TAT, therefore, a strain with a higher proportion of Q-modified tRNAs as in the case of the wt strain overexpressing tgt would express the rsxA gene less than the tgt strain overexpressing tgt and would therefore grow better in the presence of TOB. For all these reasons, it would be necessary to evaluate the effect of tgt overexpression on the translation of RsxA.

      (3.4) According to Figure 1I, the overexpression of tRNA-Tyr(GUA) caused a better growth of tgt mutant in comparison to WT. If the growth defect observed in tgt mutant in the presence of TOB is due to a better translation of the TAT codons of rsxA gene, the overexpression of tRNA-Tyr(GUA) in the tgt mutant should have resulted in even better RsxA translation a worse growth, but not the opposite result.

      (4) It cannot be stated that DNA repair is more efficient in the tgt mutant of V. cholerae, as indicated in the text of the article and in Fig 7. The authors only observe that the tgt mutant is more resistant to UV radiation and it is suggested that the reason may be TAT bias of DNA repair genes. To validate the hypothesis that UV resistance is increased because DNA repair genes are TAT biased, it would be necessary to check if DNA repair is affected by Q. UV not only produces DNA damage, but also oxidative stress. Therefore, maybe this phenotype is due to the increase in proteins related to oxidative stress controlled by RsxA, such as the superoxide dismutase encoded by sodA. It is also stated that these repair genes were found up for the tgt mutant in the Ribo-seq data, with unchanged transcription levels. Again, it is necessary to clarify this interpretation of the Ribo-seq data, since the fact that they are more represented in a tgt mutant perhaps means that translation is slower in those transcripts. Has it been observed in proteomics (wt vs tgt in the absence of TOB) whether these proteins involved in repair are more expressed in a tgt mutant?

      (5) The authors demonstrate that in E. coli the tgt mutant does not show greater resistance to UV radiation (Fig. 7D), unlike what happens in V. cholerae. It should be discussed that in previous works it has been observed that overexpression in E. coli of the tgt gene or the queF gene (Q biosynthesis) is involved in greater resistance to UV radiation (Morgante et al., Environ Microbiol, 2015 doi: 10.1111/1462-2920.12505; and Díaz-Rullo et al., Front Microbiol. 2021 doi: 10.3389/fmicb.2021.723874). As an explanation, it was proposed (Diaz-Rullo and Gonzalez-Pastor, NAR 2023 doi: 10.1093/nar/gkad667) that the observed increase in the capacity to form biofilms in strains that overexpress genes related to Q modification of tRNA would be related to this greater resistance to UV radiation.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript the authors begin with the interesting phenotype of sub-inhibitory concentrations of the aminoglycoside tobramycin proving toxic to a knockout of the tRNA-guanine transglycosylase (Tgt) of the important human pathogen, Vibrio cholerae. Tgt is important for incorporating queuosine (Q) in place of guanosine at the wobble position of GUN codons. The authors go on to define a mechanism of action where environmental stressors control expression of tgt to control translational decoding of particularly tyrosine codons, skewing the balance from TAC towards TAT decoding in the absence of the enzyme. The authors use advanced proteomics and ribosome profiling to reveal that the loss of tgt results in increased translation of proteins like RsxA and a cohort of DNA repair factors, whose genes harbor an excess of TAT codons in many cases. These findings are bolstered by a series of molecular reporters, mass spectrometry, and tRNA overexpression strains to provide support for a model where Tgt serves as a molecular pivot point to reprogram translational output in response to stress.

      Strengths:

      The manuscript has many strengths. The authors use a variety of strains, assays, and advanced techniques to discover a mechanism of action for Tgt in mediating tolerance to sub-inhibitory concentrations of tobramycin. They observe a clear phenotype for a tRNA modification in facilitating reprogramming of the translational response, and the manuscript certainly has value in defining how microbes tolerate antibiotics.

      Weaknesses:

      The conclusions of the manuscript are mostly very well-supported by the data, but in some places control experiments or peripheral findings cloud precise conclusions. Some additional clarification, discussion, or even experimental extension could be useful in strengthening these areas.

      (1) The authors have created and used a variety of relevant molecular tools. In some cases, using these tools in additional assays as controls would be helpful. For example, testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain, Figure 5C with the rxsA-GFP fusion, and/or Figure 7B with UV stress would provide additional information of the ability of tRNA overexpression to compensate for the defect in these situations.<br /> (2) The authors present a clear story with a reprogramming towards TAT codons in the knockout strain, particularly regarding tobramycin treatment. The control experiments often hint at other codons also contributing to the observed phenotypes (e.g., His or Asp), yet these effects are mostly ignored in the discussion. It would be helpful to discuss these findings at a minimum in the discussion section, or possibly experimentally address the role of His or Asp by overexpression of these tRNAs together with Tyrosine tRNA(GUA) in an experiment like that of Figure 1I to see if a more "wild type" phenotype would present. In fact, the synergy of Tyr, His, and/or Asp codons likely helps to explain the effects observed with the DNA repair genes in later experiments.<br /> (3) Regarding Figure 6D, the APB northern blot feels like an afterthought. It was loaded with different amounts of RNA as input and some samples are repeated three times, but Δcrp only once. Collectively, it makes this experiment very difficult to assess.

      Minor Points:<br /> (4) Fig S2B, do the authors have a hypothesis why the Asp and Phe tRNAs lead to a growth decrease in the untreated samples? It appears like Phe(GAA) partially compensates for the defect.<br /> (5) Lines 655 to 660 seem more appropriate as speculation in the discussion rather than as a conclusion in the results, where no direct experiments are performed. The authors might take advantage of the "Ideas and Speculation" section that eLife allows.

    1. eLife assessment

      This study provides valuable new insights into insect cognition and problem-solving in bumblebees. The authors present convincing evidence that bumblebees lack causal understanding in a string-pulling task, although evidence that bumblebees instead use image-matching for this task, which would benefit from further experiments, is currently incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in experiment 3 and then tested where the string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on a string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into this study and think it contains a nice, thorough set of experiments. I enjoyed reading the paper and felt that overall it was well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      Weaknesses:

      (1) I think there are 2 key pieces of information that can be taken from the test phase - the bees' first choice and then their behavior across the whole test. I think the first choice is critical in terms of what the bee has learned from the training phase - then their behavior from this point is informed by the feedback they obtain during the test phase. I think both pieces of information are worth considering, but their behavior across the entire test phase is giving different information than their first choice, and this distinction could be made more explicit.

      In addition, while the bees' first choice is reported, no statistics are presented for their preferences.

      (2) It seemed to me that the bees might not only be using visual feedback but also motor feedback. This would not explain their behavior in the first test choice, but could explain some of their subsequent behavior. For example, bees might learn during training that there is some friction/weight associated with pulling the string, but in cases where the string is separated from the flower, this would presumably feel different to the bee in terms of the physical feedback it is receiving. I'd be interested to see some of these test videos (perhaps these could be shared as supplementary material, in addition to the training videos already uploaded), to see what the bees' behavior looks like after they attempt to pull a disconnected string.

      (3) I think the statistics section needs to be made clearer (more in private comments).

      (4) I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors wanted to see if bumblebees could succeed in the string-pulling paradigm with broken strings. They found that bumblebees can learn to pull strings and that they have a preference to pull on intact strings vs broken ones. The authors conclude that bumblebees use image matching to complete the string-pulling task.

      Strengths:

      The study has an excellent experimental design and contributes to our understanding of what information bumblebees use to solve a string-pulling task.

      Weaknesses:

      Overall, I think the manuscript is good, but it is missing some context. Why do bumblebees rely on image matching rather than causal reasoning? Could it have something to do with their ecology? And how is the task relevant for bumblebees in the wild? Does the test translate to any real-life situations? Is pulling a natural behaviour that bees do? Does image matching have adaptive significance?

    4. Reviewer #3 (Public Review):

      Summary:

      This paper presents bees with varying levels of experience with a choice task where bees have to choose to pull either a connected or unconnected string, each attached to a yellow flower containing sugar water. Bees without experience of string pulling did not choose the connected string above chance (experiment 1), but with experience of horizontal string pulling (as in the right-hand panel of Figure 4) bees did choose the connected string above chance (experiments 2-3), even when the string colour changed between training and test (experiments 4-5). Bees that were not provided with perceptual-motor feedback (i.e they could not observe that each pull of the string moved the flower) during training still learned to string pull and then chose the connected string option above chance (experiment 6). Bees with normal experience of string pulling then failed to discriminate between connected and unconnected strings when the strings were coiled or looped, rather than presented straight (experiments 7-8).

      Weaknesses:

      The authors have only provided video of some of the conditions where the bees succeeded. In general, I think a video explaining each condition and then showing a clip of a typical performance would make it much easier to follow the study designs for scholars. Videos of the conditions bees failed at would be highly useful in order to compare different hypotheses for how the bees are solving this problem. I also think it is highly important to code the videos for switching behaviours. When solving the connected vs unconnected string tasks, when bees were observed pulling the unconnected string, did they quickly switch to the other string? Or did they continue to pull the wrong string? This would help discriminate the use of perceptual-motor feedback from other hypotheses.

      The experiments are also not described well, for my below comments I have assumed that different groups of bees were tested for experiments 1-8, and that experiment 6 was run as described in line 331, where bees were given string-pulling training without perceptual feedback rather than how it is described in Figure 4B, which describes bees as receiving string pulling training with feedback.

      The authors suggest the bees' performance is best explained by what they term 'image matching'. However, experiment 6 does not seem to support this without assuming retroactive image matching after the problem is solved. The logic of experiment 6 is described as "This was to ensure that the bees could not see the familiar "lollipop shape" while pulling strings....If the bees prefer to pull the connected strings, this would indicate that bees memorize the arrangement of strings-connected flowers in this task." I disagree with this second sentence, removing perceptual feedback during training would prevent bees memorising the lollipop shape, because, while solving the task, they don't actually see a string connected to a yellow flower, due to the black barrier. At the end of the task, the string is now behind the bee, so unless the bee is turning around and encoding this object retrospectively as the image to match, it seems hard to imagine how the bee learns the lollipop shape.

      Despite this, the authors go on to describe image matching as one of their main findings. For this claim, I would suggest the authors run another experiment, identical to experiment 6 but with a black panel behind the bee, such that the string the bee pulls behind itself disappears from view. There is now no image to match at any point from the bee's perspective so it should now fail the connectivity task.

      Strengths:

      Despite these issues, this is a fascinating dataset. Experiments 1 and 2 show that the bees are not learning to discriminate between connected and unconnected stimuli rapidly in the first trials of the test. Instead, it is clear that experience in string pulling is needed to discriminate between connected and unconnected strings. What aspect of this experience is important? Experiment 6 suggests it is not image matching (when no image is provided during problem-solving, but only afterward, bees still attend to string connectivity) and casts doubt on perceptual-motor feedback (unless from the bee's perspective, they do actually get feedback that pulling the string moves the flower, video is needed here). Experiments 7 and 8 rule out means-end understanding because if the bees are capable of imagining the effect of their actions on the string and then planning out their actions (as hypotheses such as insight, means-end understanding and string connectivity suggest), they should solve these tasks.

      If the authors can compare the bees' performance in a more detailed way to other species, and run the experiment suggested, this will be a highly exciting paper

    1. eLife assessment

      This study provides a single-cell atlas for syngnathid fishes (seahorses, pipefishes, and seadragons), a valuable new resource to investigate the molecular basis of the many unique characters that define the pipefish embryo. The findings are generally supported by solid arguments, but whereas the single-cell RNA-sequencing analysis appears to be of good quality, the spatiotemporal expression data only incompletely support the authors' arguments. Additional computational analyses on cell identity and developmental trajectories would allow a deeper examination of the current data from these unconventional model organisms, to provide new insights into understanding the extraordinary adaptations of the Syngnathidae family. If appropriately improved, the work could be of broad interest for evolutionary developmental biology, particularly for fishes.

    2. Reviewer #1 (Public Review):

      Syngnathid fishes (seahorses, pipefishes, and seadragons) present very particular and elaborated features among teleosts and a major challenge is to understand the cellular and molecular mechanisms that permitted such innovations and adaptations. The study provides a valuable new resource to investigate the morphogenetic basis of four main traits characterizing syngnathids, including the elongated snout, toothlessness, dermal armor, and male pregnancy. More particularly, the authors have focused on a late stage of pipefish organogenesis to perform single-cell RNA-sequencing (scRNA-seq) completed by in situ hybridization analyses to identify molecular pathways implicated in the formation of the different specific traits.

      The first set of data explores the scRNA-seq atlas composed of 35,785 cells from two samples of gulf pipefish embryos that authors have been able to classify into major cell types characterizing vertebrate organogenesis, including epithelial, connective, neural, and muscle progenitors. To affirm identities and discover potential properties of clusters, authors primarily use KEGG analysis that reveals enriched genetic pathways in each cell types. While the analysis is informative and could be useful for the community, some interpretations appear superficial and data must be completed to confirm identities and properties. Notably, supplementary information should be provided to show quality control data corresponding to the final cell atlas including the UMAP showing the sample source of the cells, violin plots of gene count, UMI count, and mitochondrial fraction for the overall dataset and by cluster, and expression profiles on UMAP of selected markers characterizing cluster identities.

      The second set of data aims to correlate the scRNA-seq analysis with in situ hybridizations (ISH) in two different pipefish (gulf and bay) species to identify and characterize markers spatially, and validate cell types and signaling pathways active in them. While the approach is rational, the authors must complete the data and optimize labeling protocols to support their statements. One major concern is the quality of ISH stainings and images; embryos show a high degree of pigmentation that could hide part of the expression profile, and only subparts and hardly detectable tissues/stainings are presented. The authors should provide clear and good-quality images of ISH labeling on whole-mount specimens, highlighting the magnification regions and all other organs/structures (positive controls) expressing the marker of interest along the axis. Moreover, ISH probes have been designed and produced on gulf pipefish genome and cDNA respectively, while ISH labeling has been performed indifferently on bay or gulf pipefish embryos and larvae. The authors should specify stages and species on figure panels and should ensure sequence alignment of the probe-targeted sequences in the two species to validate ISH stainings in the bay pipefish. Moreover, spatiotemporal gene expression being a very dynamic process during embryogenesis, interpretations based on undefined embryonic and larval stages of pipefish development and compared to 3dpf zebrafish are insufficient to hypothesize on developmental specificities of pipefish features, such as on the absence of tooth primordia that could represent a very discrete and transient cell population. The ISH analyses would require a clean and precise spatiotemporal expression comparison of markers at the level of the entire pipefish and zebrafish specimens at well-defined stages, otherwise, the arguments proposed on teleost innovations and adaptations turn out to be very speculative.

      To conclude, whereas the scRNA-seq dataset in this unconventional model organism will be useful for the community, the spatiotemporal and comparative expression analyses have to be thoroughly pushed forward to support the claims. Addressing these points is absolutely necessary to validate the data and to give new insights to understand the extraordinary evolution of the Syngnathidae family.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors present the first single-cell atlas for syngathid fishes, providing a resource for future evolution & development studies in this group.

      Strengths:

      The concept here is simple and I find the manuscript to be well written. I like the in situ hybridization of marker genes - this is really nice. I also appreciate the gene co-expression analysis to identify modules of expression. There are no explicit hypotheses tested in the manuscript, but the discovery of these cell types should have value in this organism and in the determination of morphological novelties in seahorses and their relatives.

      Weaknesses:

      I think there are a few computational analyses that might improve the generality of the results.

      (1) The cell types: The authors use marker gene analysis and KEGG pathways to identify cell types. I'd suggest a tool like SAMap (https://elifesciences.org/articles/66747) which compares single-cell data sets from distinct organisms to identify 'homologous' cell types -- I imagine the zebrafish developmental atlases could serve as a reasonable comparative reference.

      (2) Trajectory analyses: The authors suggest that their analyses might identify progenitor cell states and perhaps related differentiated states. They might explore cytoTRACE and/or pseudotime-based trajectory analyses to more fully delineate these ideas.

      (3) Cell-cell communication: I think it's very difficult to identify 'tooth primordium' cell types, because cell types won't be defined by an organ in this way. For instance, dental glia will cluster with other glia, and dental mesenchyme will likely cluster with other mesenchymal cell types. So the histology and ISH is most convincing in this regard. Having said this, given the known signaling interactions in the developing tooth (and in development generally) the authors might explore cell-cell communication analysis (e.g., CellChat) to identify cell types that may be interacting.

    4. Reviewer #3 (Public Review):

      Summary:

      This study established a single-cell RNA sequencing atlas of pipefish embryos. The results obtained identified unique gene expression patterns for pipefish-specific characteristics, such as fgf22 in the tip of the palatoquadrate and Meckel's cartilage, broadly informing the genetic mechanisms underlying morphological novelty in teleost fishes. The data obtained are unique and novel, potentially important in understanding fish diversity. Thus, I would enthusiastically support this manuscript if the authors improve it to generate stronger and more convincing conclusions than the current forms.

      Weaknesses:

      Regarding the expression of sfrp1a and bmp4 dorsal to the elongating ethmoid plate and surrounding the ceratohyal: are their expression patterns spatially extended or broader compared to the pipefish ancestor? Is there a much closer species available to compare gene expression patterns with pipefish? Did the authors consider using other species closely related to pipefish for ISH? Sfrp1a and bmp4 may be expressed in the same regions of much more closely related species without face elongation. I understand that embryos of such species are not always accessible, but it is also hard to argue responsible genes for a specific phenotype by only comparing gene expression patterns between distantly related species (e.g., pipefish vs. zebrafish). Due to the same reason, I would not directly compare/argue gene expression patterns between pipefish and mice, although I should admit that mice gene expression patterns are sometimes helpful to make a hypothesis of fish evolution. Alternatively, can the authors conduct ISH in other species of pipefish? If the expression patterns of sfrp1a and bmp4 are common among fishes with face elongation, the conclusion would become more solid. If these embryos are not available, is it possible to reduce the amount of Wnt and BMP signal using Crispr/Cas, MO, or chemical inhibitor? I do think that there are several ways to test the Wnt and/or BMP hypothesis in face elongation.

    1. eLife assessment

      This study makes a connection between cellular metabolism and proteostasis through MAGIC, a previously proposed protein quality control pathway of clearance of cytosolic misfolded and aggregated proteins by importing into mitochondria. The authors reveal the role of Snf1, a yeast AMPK, in preventing the import of misfolded proteins to mitochondria for MAGIC controlled by the transcription factor Hap4, depending on the cellular metabolic status. The key message is important, although the evidence for physiological relevance of MAGIC for overall cellular proteostasis and its molecular regulation by Snf1 remains incomplete.

    1. eLife assessment

      This useful paper addresses a novel exercise mimetic agent on muscle exercise and performance. While the data provided are interesting, the evidence is incomplete, as much of it is correlative.

    1. eLife assessment

      The paper presents valuable insights into the success of the parasitoid Trichopria drosophilae on Drosophila suzukii, elucidating the importance of both molecular adaptations, such as specialized venom proteins and unique cell types, ecological strategies, including tolerance of intraspecific competition and avoidance of interspecific competition. Through convincing methodological approaches, the authors demonstrate how these adaptations optimize nutrient uptake and enhance parasitic success, highlighting the intricate coordination between molecular and ecological factors in driving parasitization success.

    1. eLife assessment

      The authors discuss an effect, "diffusive lensing", by which particles would accumulate in high-viscosity regions – for instance in the intracellular medium. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention. The "lensing effect" discussed is a direct consequence of the choice of the Ito convention without spurious drift which has been discussed before and its adequacy for the intracellular medium is insufficiently discussed and relatively doubtful. Consequently, the relevance of the presented results for biology remain unclear and based on incomplete evidence.

  2. May 2024
    1. eLife assessment

      This important study provides deep insight into a ubiquitous, but poorly understood, phenomenon: synaptic noise (primarily due to failures). Through a combination of theoretical analysis, simulations, and comparison to existing experimental data, this paper makes a compelling case that synapses are noisy because reducing noise is expensive. It touches on probably the most significant feature of living organisms -- their ability to learn -- and will be of broad interest to the neuroscience community.

    2. Reviewer #1 (Public Review):

      Summary:

      Given the cost of producing action potentials and transmitting them along axons, it has always seemed a bit strange that there are synaptic failures: when a spike arrives at a synapse, about half the time nothing happens. This paper proposes a perfectly reasonable explanation: reducing failures (or, more generally, reducing noise) is costly. Four possible mechanisms are proposed, each associated with a different cost, with costs of the form 1/sigma_i^rho where sigma_i is the failure-induced variability at synapse i and rho is an exponent. The four different mechanisms produce four different values of rho.

      What is interesting about the study is that the model makes experimental predictions about the relationship between learning rate, variability and presynaptic firing rate. Those predictions are consistent with experimental data, making it a strong candidate model. The fact that the predictions come from reasonable biological mechanisms make it a very strong candidate model and suggest several experiments to test it further.

      Interestingly, the predictions made by this model are nearly indistinguishable from the predictions made by a normative model (Synaptic plasticity as Bayesian inference. Aitchison it al., Nature Neurosci. 24:565-571 (2021). As pointed out by the authors, working out whether the brain is using Bayesian inference to tune learning rules, or it just looks like it's Bayesian inference but the root cause is cost minimization, will be an interesting avenue for future research.

      Finally, the authors relate their cost of reliability to the cost used in variational Bayesian inference. Intriguingly, the biophysical cost provides an upper bound on the variational cost. This is intellectually satisfying, as it answers a "why" question: why would evolution evolve to produce the kind of costs seen in the brain?

      Strengths:

      This paper provides a strong mix of theoretical analysis, simulations and comparison to experiments. And the extended appendices, which are very easy to read, provide additional mathematical insight.

      Weaknesses:

      None.

    3. Reviewer #2 (Public Review):

      Summary

      This manuscript argues about the similarity between two frameworks describing synaptic plasticity. In the Bayesian inference perspective, due to the noise and the limited available pre- and postsynaptic information, synapses can only have an estimate of what should be their weight. The belief about those weights is described by their mean and variance. In the energy efficient perspective, synaptic parameters (individual means and variances) are adapted such that the neural network achieves some task while penalizing large mean weights as well as small weight variances. Interestingly, the authors show both numerically and analytically the strong link between those two frameworks. In particular, both frameworks predict that (a) synaptic variances should decrease when the input firing rate increases and (b) that the learning rate should increase when the weight variances increase. Both predictions have some experimental support.

      Strengths

      (1) Overall, the paper is very well written and the arguments are clearly presented.

      (2) The tight link between the Bayesian inference perspective and the energy efficiency perspective is elegant and well supported, both with numerical simulations as well as with analytical arguments.

      (3) I also particularly appreciate the derivation of the reliability cost terms as a function of the different biophysical mechanisms (calcium efflux, vesicle membrane, actin and trafficking). Independently of the proposed mapping between the Bayesian inference perspective and the energy efficiency perspective, those reliability costs (expressed as power-law relationships) will be important for further studies on synaptic energetics.

      Weaknesses

      (1) As recognised by the authors, the correspondence between the entropy term in the variational inference description and the reliability cost in the energetic description is strong, but not perfect. Indeed, the entropy term scales as -log(sigma) while reliability cost scales as sigma^(-rho).

      (2) Even though this is not the main point of the paper, I appreciate the effort made by the authors to look for experimental data that could in principle validate the Bayesian/energetic frameworks. A stronger validation will be an interesting avenue for future research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses

      (1) The authors face a technical challenge (which they acknowledge): they use two numbers (mean and variance) to characterize synaptic variability, whereas in the brain there are three numbers (number of vesicles, release probability, and quantal size). Turning biological constraints into constraints on the variance, as is done in the paper, seems somewhat arbitrary. This by no means invalidates the results, but it means that future experimental tests of their model will be somewhat nuanced.

      Agreed. There are two points to make here.

      First, the mean and variance are far more experimentally accessible than n, p and q. The EPSP mean and variance is measured directly in paired-patch experiments, whereas getting n, p and q either requires far more extensive experimentation, or making strong assumptions. For instance, the data from Ko et al. (2013) gives the EPSP mean and variance, but not (directly) n, p and q. Thus, in some ways, predictions about means and variances are easier to test than predictions about n, p and q.

      That said, we agree that in the absence of an extensive empirical accounting of the energetic costs at the synapse, there is inevitably some arbitrariness as we derive our energetic costs. That was why we considered four potential functional forms for the connection between the variance and energetic cost, which covered a wide range of sensible forms for this energetic cost. Our results were robust to this wide range functional forms, indicating that the patterns we describe are not specifically due to the particular functional form, but arise in many settings where there is an energetic cost for reliable synaptic transmission.

      (2) The prediction that the learning rate should increase with variability relies on an optimization scheme in which the learning rate is scaled by the inverse of the magnitude of the gradients (Eq. 7). This seems like an extra assumption; the energy efficiency framework by itself does not predict that the learning rate should increase with variability. Further work will be needed to disentangle the assumption about the optimization scheme from the energy efficiency framework.

      Agreed. The assumption that learning rates scale with synapse importance is separate. However, it is highly plausible as almost all modern state-of-the-art deep learning training runs use such an optimization scheme, as in practice it learns far faster than other older schemes. We have added a sentence to the main text (line 221), indicating that this is ultimately an assumption.

      Major

      (1) The correspondence between the entropy term in the variational inference description and the reliability cost in the energetic description is a bit loose. Indeed, the entropy term scales as −log(σ) while reliability cost scales as σ−ρ. While the authors do make the point that σ−ρ upper bounds −log(σ) (up to some constant), those two cost terms are different. This raises two important questions:

      a. Is this difference important, i.e. are there scenarios for which the two frameworks would have different predictions due to their different cost functions?

      b. Alternatively, is there a way to make the two frameworks identical (e.g. by choosing a proposal distribution Q(w) different from a Gaussian distribution (and tuneable by a free parameter that could be related to ρ) and therefore giving rise to an entropy term consistent with the reliability cost of the energy efficiency framework)?

      To answer b first, there is no natural way to make the two frameworks identical (unless we assume the reliability cost is proportional to log_σsyn_, and we don’t think there’s a biophysical mechanism that would give rise to such a cost). Now, to answer a, in Fig. 7 we extensively assessed the differences between the energy efficient σsyn and the Bayesian σpost. In Fig.7bc, we find that σsyn and σpost are positively correlated in all models. This positive correlation indicates that the qualitative predictions made by the two frameworks (Bayesian inference and energy efficiency) are likely to be very similar. Importantly though, there are systematic differences highlighted by Fig. 7ab. Specifically, the energy efficient σsyn tends to vary less than the Bayesian σpost. This appears in Fig. 7b which shows the relationship between σsyn (on the y-axis) and σpost (on the x-axis). Specifically, this plot has a slope that is smaller than one for all our models of the biophysical cost. Further, the pattern also appears in the covariance ellipses in Fig. 7a, in that the Bayesian covariance ellipses tend to be long and thin, while the energy efficient covariance ellipsis are rounder. Critically though both covariance ellipses show the same pattern in that there is more noise along less important directions (as measured by the Hessian).

      We have added a sentence (line 273) noting that the search for a theoretical link is motivated by our observations in Fig. 7 of a strong, but not perfect link between the pattern of variability predicted by Bayesian and energy-efficient synapses.

      (2) Even though I appreciate the effort of the authors to look for experimental evidence, I still find that the experimental support (displayed in Fig. 6) is moderate for three reasons.

      a. First, the experimental and simulation results are not displayed in a consistent way. Indeed, Fig 6a displays the relative weight change |Dw|/w as a function of the normalised variability σ_2/|_µ| in experiments whereas the simulation results in Fig 5c display the variance σ_2 as a function of the learning rate. Also, Fig 6b displays the normalised variability _σ_2/|_µ| as a function of the input rate whereas Fig 5b displays the variance _σ_2 as a function of the input rate. As a consequence the comparison between experimental and simulation results is difficult.

      b. Secondly, the actual power-law exponents in the experiments (see Fig 6a resp. 6b) should be compared to the power-law exponents obtained in simulation (see Fig 5c resp. Fig 5b). The difficulty relies here on the fact that the power-law exponents obtained in the simulations directly depend on the (free) parameter ρ. So far the authors precisely avoided committing to a specific ρ, but rather argued that different biophysical mechanisms lead to different reliability exponents ρ. Therefore, since there are many possible exponents ρ (and consequently many possible power-law exponents in simulation results in Fig 5), it is likely that one of them will match the experimental data. For the argument to be stronger, one would need to argue which synaptic mechanism is dominating and therefore come up with a single prediction that can be falsified experimentally (see also point 4 below).

      c, Finally, the experimental data presented in Fig6 are still “clouds of points". A coefficient of r \= 0_.52 (in Fig 6a) is moderate evidence while the coefficient of _r \= −0_._26 (in Fig 6b) is weak evidence.

      The key thing to remember is that our paper is not about whether synapses are “really" Bayesian or energy efficient (or both/neither). Instead, the key point of our paper, as expressed in the title, is to show that the experimental predictions of Bayesian synapses are very similar to the predictions from energy efficient synapses. And therefore energy efficient synapses are very difficult to distinguish experimentally from Bayesian synapses. In that context, the two plots in Fig. 6 are not really intended to present evidence in favour of the energy efficiency / Bayesian synapses. In fact, Fig. 6 isn’t meant to constitute a contribution of the paper at all, instead, Fig. 6 serves merely as illustrations of the kinds of experimental result that have (Aitchison et al. 2021) or might (Schug et al. 2021) be used to support Bayesian synapses. As such, Fig. 6 serves merely as a jumping-off point for discussing how very similar results might equally arise out of Bayesian and energy-efficiency viewpoints.

      We have modified our description of Fig. 6 to further re-emphasise that the panels in Fig. 6 is not our contribution, but is taken directly from Schug et al. 2021 and Aitchison et al. 2021 (we have also modified Fig 6 to be precisely what was plotted in Schug et al. 2021, again to re-emphasise this point). Further, we have modified the presentation to emphasise that these plots serve merely as jumping off points to discuss the kinds of predictions that we might consider for Bayesian and energy efficient synapses.

      This is important, because we would argue that the “strength of support" should be assessed for our key claim, made in the title, that “Signatures of Bayesian inference emerge from energy efficient synapses".

      a) To emphasise that these are previously published results, we have chosen axes to matchthose used in the original work (Aitchison et al. 2021) and (Schug et al. 2021).

      b) We agree that a close match between power-law exponents would constitute strong evidencefor energy-efficiency / Bayesian inference, and might even allow us to distinguish them. We did consider such a comparison, but found it was difficult for two reasons. First, while the confidence intervals on the slopes exclude zero, they are pretty broad. Secondly, while the slopes in a one-layer network are consistent and match theory (Appendix 5) the slopes in deeper networks are far more inconsistent. This is likely to be due to a number of factors such as details of the optimization algorithm and initialization. Critically, if details of the optimization algorithm matter in simulation, they may also matter in the brain. Therefore, it is not clear to us that a comparison of the actual slopes is can be relied upon.

      To reiterate, the point of our article is not to make judgements about the strength ofevidence in previously published work, but to argue that Bayesian and energy efficient synapses are difficult to distinguish experimentally as they produce similar predictions. That said, it is very difficult to make blanket statements about the strength of evidence for an effect based merely on a correlation coefficient. It is perfectly possible to have moderate correlation coefficients along with very strong evidence of an effect (and e.g. very strong p-values), e.g. if there is a lot of data. Likewise, it is possible to have a very large correlation coefficient along with weak evidence of an effect (e.g. if we only have three or four datapoints, which happen to lie in a straight line). A small correlation coefficient is much more closely related to the effect-size. Specifically, the effect-size, relative to the “noise", which usually arises from unmeasured factors of variation. Here, we know there are many, many unmeasured factors of variation, so even in the case that synapses are really Bayesian / energy-efficient, the best we can hope for is low correlation coefficients

      As mentioned in the public review, a weakness in the paper is the derivation of the constraints on σi given the biophysical costs, for two reasons.

      a.First, it seemed a bit arbitrary whether you hold n fixed or p fixed.

      b.Second, at central synapses, n is usually small – possibly even usually 1: REF(Synaptic vesicles transiently dock to refill release sites, Nature Neuroscience 23:1329-1338, 2020); REF(The ubiquitous nature of multivesicular release Trends Neurosci. 38:428-438, 2015). Fixing n would radically change your cost function. Possibly you can get around this because when two neurons are connected there are multiple contacts (and so, effectively, reasonably large n). It seems like this is worth discussing.

      a) Ultimately, we believe that the “real” biological cost function is very complex, and most likely cannot be written down in a simple functional form. Further, we certainly do not have the experimental evidence now, and are unlikely to have experimental evidence for a considerable period into the future to pin down this cost function precisely. In that context, we are forced to resort to two strategies. First, using simplifying assumptions to derive a functional form for the cost (such as holding n or p fixed). Second, considering a wide range of functional forms for the cost, and ensuring our argument works for all of them.

      b) We appreciate the suggestion that the number of connections could be used as a surrogate where synapses have only a single release site. As you suggest we can propose an alternative model for this case where n represents the number of connections between neurons. We have added this alternative interpretation to our introduction of the quantal model under title “Biophysical costs". For a fixed PSP mean we could either have many connections with small vesicles or less connections with larger vesicles. Similarly for the actin cost we would certainly require more actin if the number of connections were increased.

      Minor

      (1) A few additional references could further strengthen some claims of the paper:

      Davis, Graeme W., and Martin Muller. “Homeostatic Control of Presynaptic Neurotransmitter Release." Annual Review of Physiology 77, no. 1 (February 10, 2015): 251-70. https://doi.org/10.1146/annurev-physiol-021014-071740. This paper provides elegant experimental support for the claim (in line 538 now 583) that µ is kept constant and q acts as a compensatory variable.

      Jegminat, Jannes, Simone Carlo Surace, and Jean-Pascal Pfister. “Learning as Filtering: Implications for Spike-Based Plasticity." Edited by Blake A Richards. PLOS Computational Biology 18, no. 2 (February 23, 2022): e1009721. https://doi.org/10.1371/journal.pcbi.1009721.

      This paper also showed that a lower uncertainty implies a lower learning rate (see e.g. in line 232), but in the context of spiking neurons.

      Figure 1 of the the first suggested paper indeed shows that quantal size is a candidate for homeostatic scaling (fixing µ). This review also references lots of further evidence of quantal scaling and evidence for both presynaptic and postsynaptic scaling of q leaving space for speculation on whether vesicle radius or postsynaptic receptor number is the source of a compensatory q. On line 583 we have added a few lines pointing to the suggested review paper.

      The second reference demonstrates Bayesian plasticity in the context of STDP, proposing learning rates tuned to the covariance in spike timing. We have added this as extra support for assuming an optimisation scheme that tunes learning rates to synapse importance and synapse variability (line 232).

      In the numerical simulations, the reliability cost is implemented with a single power-law expression (reliability cost ). However, in principle, all the reliability costs will play in conjunction, i.e. reliability cost . While I do recognise that it may be difficult to estimate the biophysical values of the various ci, it might be still relevant to comment on this.

      Agreed. Limitations in the literature meant that we could only form a cursory review of the relative scale of each cost using estimates by Atwell, (2001), Engl, (2015). On line 135 we have added a paragraph explaining the rationale for considering each cost independently.

      (3) In Eq. 8: σ_2 doesn’t depend on variability in _q, which would add another term; barring algebra mistakes, it’s . It seems worth mentioning why you didn’t include it. Can you argue that it’s a small effect?

      Agreed. Ultimately, we dropped this term because we expected it to be small relative to variability in vesicle release, and because it would be difficult to quantify In practice, the variability is believed to be contributed mostly by variability in vesicle release. The primary evidence for this is histograms of EPSP amplitudes which show classic multi-peak structure, corresponding to one, two three etc. EPSPs. Examples of these plots include:

      - “The end-plate potential in mammalian muscle”, Boyd and Martin (1956); Fig. 8.

      - “Structure and function of a neocortical synapse”, Holler-Rickauer et al. (2019); Extended Figure 5.

      (3) On pg. 7 now pg. 8, when the Hessian is introduced, why not say what it is? Or at least the diagonal elements, for which you just sum up the squared activity. That will make it much less mysterious. Or are we relying too much on the linear model given in App 2? If so, you should tell us how the Hessian was calculated in general. Probably in an appendix.

      With the intention of maintaining the interest of a wide audience we made the decision to avoid a mathematical definition of the Hessian, opting instead for a written definition i.e. line 192 - “Hii; the second derivatives of the objective with respect to wi.” and later on a schematic (Fig. 4) for how the second derivative can be understood as a measure of curvature and synapse importance. Nonetheless, this review point has made us aware that the estimated Hessian values plotted in Fig. 5a have been insufficiently explained so we have added a reference on line 197 to the appendix section where we show how we estimated the diagonal values of the Hessian.

      (4) Fig. 5: assuming we understand things correctly, Hessian ∝ |x|2. Why also plot σ_2 versus |_x|? Or are we getting the Hessian wrong?

      The Hessian is proportional to . If you assume that time steps are small and neurons spike, then , and . it is difficult to say what timestep is relevant in practice.

      (5) To get Fig. 6a, did you start with Fig. Appendix 1-figure 4 from Schug et al, and then use , drop the q, and put 1 − p on the x-axis? Either way, you should provide details about where this came from. It could be in Methods.

      We have modified Fig. 6 to use the same axes as in the original papers.

      (6) Lines 190-3: “The relationship between input firing rate and synaptic variability was first observed by Aitchison et al. (2021) using data from Ko et al. (2013) (Fig. 6a). The relationship between learning rate and synaptic variability was first observed by Schug et al. (2021), using data from Sjostrom et al. (2003) as processed by Costa et al. (2017) (Fig. 6b)." We believer 6a and 6b should be interchanged in that sentence.

      Thank you. We have switched the text appropriately.

      (7) What is posterior variance? This seems kind of important.

      This refers to the “posterior variance" obtained using a Bayesian interpretation of the problem of obtaining good synaptic weights (Aitchison et al. 2021). In our particular setting, we estimate posterior variances by setting up the problem as variational inference: see Appendix 4 and 5, which is now referred to in line 390.

      (8) Lines 244-5: “we derived the relationships between the optimized noise, σi and the posterior variable, σpost as a function of ρ (Fig. 7b;) and as a function of c (Fig. 7c)." You should tell the reader where you derived this. Which is Eq. 68c now 54c. Except you didn’t actually derive it; you just wrote it down. And since we don’t know what posterior variance is, we couldn’t figure it out.

      If H is the Hessian of the log-likelihood, and if the prior is negligable relative to the the likelihood, then we get Eq. 69c. We have added a note on this point to the text.

      (9) We believe Fig. 7a shows an example pair of synapses. Is this typical? And what about Figs. 7b and c. Also an example pair? Or averages? It would be helpful to make all this clear to the reader.

      Fig. 7a shows an illustrative pair of synapses, chosen to best display the relative patterns of variability under energy efficient and Bayesian synapses. We have noted this point in the legend for Fig. 7. Fig. 7bc show analytic relationships between energy efficient and Bayesian synapses, so each line shows a whole continuum of synapses(we have deleted the misleading points at the ends of the lines in Fig. 7bc).

      (10)  The y-axis of Fig 6a refers to the synaptic weight as w while the x-axis refers to the mean synaptic weight as mu. Shouldn’t it be harmonised? It would be particularly nice if both were divided by µ, because then the link to Fig. 5c would be more clear.

      We have changed the y-axis label of Fig. 6a from w to µ. Regarding the normalised variance, we did try this but our Gaussian posteriors allowed the mean to become small in our simulations, giving a very high normalised variance. To remedy this we would likely need to assume a log- posterior, but this was out of scope for the present work.

      (11) Line 250 (now line 281): “Finally, in the Appendix". Please tell us which Appendix. Also, why not point out here that the bound is tightest at small ρ?

      We have added the reference to the the section of the appendix with the derivation of the biological cost as a bound on the ELBO. We have also referenced the equation that gives the limit of the biological cost as ρ tends to zero.

      (12) When symbols appear that previously appeared more than about two paragraphs ago, please tell us where they came from. For instance, we spent a lot of time hunting for ηi. And below we’ll complain about undefined symbols. Which might mean we just missed them; if you told us where they were, that problem would be eliminated.

      We have added extra references for the symbols in the text following Eq. 69.

      (13) Line 564, typo (we think): should be σ−2.

      Good spot. This has been fixed.

      (14)  A bit out of order, but we don’t think you ever say explicitly that r is the radius of a vesicle. You do indicate it in Fig. 1, but you should say it in the main text as well.

      We have added a note on this to the legend in Fig. 1.

      (15) Eq. 14: presumably there’s a cost only if the vesicle is outside the synapse? Probably worth saying, since it’s not clear from the mechanism.

      Looking at Pulido and Ryan (2021) carefully, it is clear that they are referring to a cost for vesicles inside the presynaptic side of the synapse. (Importantly, vesciles don’t really exist outside the synapse; during the release process, the vesicle membrane becomes part of the cell membrane, and the contents of the vesicle is ejected into the synaptic cleft).

      (16) App. 2: why solve for mu, and why compute the trace of the Hessian? Not that it hurts, but things are sort of complicated, and the fewer side points the better.

      Agreed, we have removed the solution for μ, and the trace, and generally rewritten Appendix 2 to clarify definitions, the Hessian etc.

      (17) Eq. 35: we believe you need a minus sign on one side of the equation. And we don’t believe you defined p(d|w). Also, are you assuming g = partial log p(d|w)/partial w? This should be stated, along with its implications. And presumably, it’s not really true; people just postulate that p(d|w) ∝ exp(−log_loss_)?

      We have replaced p(d|w) with p(y, x|w), and we replaced “overall cost” with log P(y|w, x). Yes, we are also postulating that p(y|w, x) ∝ exp(−log loss), though in our case that does make sense as it corresonds to a squared loss.

      As regards the minus sign, in the orignal manuscript, we had the second derivative of the cost. There is no minus sign for the cost, as the Hessian of the cost at the mode is positive semi-definite. However, once we write the expression in terms of a log-likelihood, we do need a minus sign (as the Hessian of the log-likelihood at a mode is negative semi-definite).

      (18) Eq. 47 now Eq. 44: first mention of CBi;i?

      We have added a note describing CB around these equations.

      (19) The “where" doesn’t make sense for Eqs. 49 and 50; those are new definitions.

      We have modified the introduction of these equations to avoid the problematic “where”.

      (20) Eq. 57 and 58 are really one equation. More importantly: where does Eq. 58 come from? Is this the H that was defined previously? Either way, you should make that clear.

      We have removed the problematic additional equation line number, and added a reference to where H comes from.

      (21) In Eq. 59 now Eq. 60 aren’t you taking the trace of a scalar? Seems like you could skip this.

      We have deleted this derivation, as it repeats material from the new Appendix 2.

      (22) Eq. 66 is exactly the same as Eq. 32. Which is a bit disconcerting. Are they different derivations of the same quantity? You should comment on this.

      We have deleted lots of the stuff in Appendix 5 as, we agree, it repeats material from Appendix 2 (which has been rewritten and considerably clarified).

      (23) Eq. 68 now 54, left column: please derive. we got:

      gai = gradient for weight i on trial

      where the second equality came from Eq. 20. Thus

      Is that correct? If so, it’s a lot to expect of the reader. Either way, a derivation would

      be helpful.

      We agree it was unnecessary and overly complex, so we have deleted it.

      (24) App 5–Figure 2: presumably the data for panel b came from Fig. 6a, with the learning rate set to Δw/w? And the data for panel c from Fig. 6b? This (or the correct statement, if this is wrong) should be mentioned.

      Yes, the data for panel c came from Fig. 6b. We have deleted the data in panel b, as there are some subtleties in interpretation of the learning rates in these settings.

      (25) line 952 now 946: typo, “and the from".

      Corrected to “and from".

    1. eLife assessment

      This important study reveals the use of an allocentric spatial reference frame in the updating perception of the location of a dimly lit target during locomotion. The evidence supporting this claim is compelling, based on a series of cleverly and carefully designed behavioral experiments. The results will be of interest not only to scientists who study perception, action and cognition but also to engineers who work on developing visually guided robots and self-driving vehicles.

    2. Reviewer #1 (Public Review):

      This study conducted a series of experiments to comprehensively support the allocentric rather than egocentric visual spatial reference updating for the path-integration mechanism in the control of target-oriented locomotion. Authors firstly manipulated the waiting time before walking to tease apart the influence from spatial working memory in guiding locomotion. They demonstrated that the intrinsic bias in perceiving distance remained constant during walking and that the establishment of a new spatial layout in the brain took a relatively longer time beyond the visual-spatial working memory. In the following experiments, the authors then uncovered that the strength of the intrinsic bias in distance perception along the horizontal direction is reduced when participants' attention is distracted, implying that world-centered path integration requires attentional effort. This study also revealed horizontal-vertical asymmetry in a spatial coding scheme that bears a resemblance to the locomotion control in other animal species such as desert ants.

      The revised version of the study effectively situates the research within the broader context of terrestrial navigation, focusing on the movement of land-based creatures and offers a clearer explanation for the potential neurological basis of the human brain's allocentric odometer. Previous feedback has been thoroughly considered, and additional details have been incorporated into the presentation of the results.

    3. Reviewer #3 (Public Review):

      This study investigated what kind of reference (allocentric or egocentric) frame we used for perception in darkness. This question is essential and was not addressed much before. The authors compared the perception in the walking condition with that in the stationary condition, which successfully separated the contribution of self-movement to the spatial representation. In addition, the authors also carefully manipulated the contribution of the waiting period, attentional load, vestibular input, testing task, and walking direction (forward or backward) to examine the nature of the reference frame in darkness systematically.

      I am a bit confused by Figure 2b. Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer. In Figure 2, however, the authors assumed that the perceived target was located on the interception between the intrinsic bias curve and the viewing line from the NEW eye position to the target. This suggests that the perceived object depends on the observer's new location, which seems odd with the allocentric coordinate hypothesis.

      According to Fig 2b, the perceived size should be left-shifted and lifted up in the walking condition compared to that in the stationary condition. However, in Figure 3C and Fig 4, the perceived size was the same height as that in the baseline condition.

      Is the left-shifted perceived distance possibly reflecting a kind of compensation mechanism? Participants could not see the target's location but knew they had moved forward. Therefore, their brain automatically compensates for this self-movement when judging the location of a target. This would perfectly predict the left-shifted but not upward-shifted data in Fig 3C. A similar compensation mechanism exists for size constancy in which we tend to compensate for distance in computing object size.

      According to Fig 2a, the target, perceived target, and eye should be aligned in one straight line. This means that connecting the physical targets and the corresponding perceived target results in straight lines that converge at the eye position. This seems, however, unlikely in Figure 3c.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) Authors need to acknowledge the physical effort in addition to visual information for the spatial coding and may consider the manipulation of physical efforts in the future to support the robustness of constant intrinsic bias in ground-based spatial coding during walking.

      Whether one’s physical effort can affect spatial coding for visual perception is not a settled issue.  Several empirical studies have not been able to obtain evidence to support the claim.  For example, empirical studies by Hutchison & Loomis (2009) and Durgin et al. (2009) did not find wearing a heavy backpack significantly influenced distance perception, in contrast to the findings by Proffitt et al (2003).  We respectfully request not to discuss this issue in our revision since it is not closely related to the focus of the current study.

      (2) Furthermore, it would be more comprehensive and fit into the Neuroscience Section if the authors can add in current understandings of the spatial reference frames in neuroscience in the introduction and discussion, and provide explanations on how the findings of this study supplement the physiological evidence that supports our spatial perception as well.  For instance, world-centered representations of the environment, or cognitive maps, are associated with hippocampal formation while self-centered spatial relationships, or image spaces, are associated with the parietal cortex (see Bottini, R., & Doeller, C. F. (2020). Knowledge Across Reference Frames: Cognitive Maps and Image Spaces. Trends in Cognitive Sciences, 24(8),606-619. https://doi.org/10.1016/j.tics.2020.05.008 for details)

      We have now added this important discussion in the revision on pages 12-13.

      We thank the reviewer for the helpful comments.

      Reviewer 2:

      (1) ….As a result, it is unclear to what extent this "allocentric" intrinsic bias is involved in our everyday spatial perception. To provide more context for the general audience, it would be beneficial for the authors to address this issue in their discussion.

      We have clarified this on pages 3-4.  In brief, our hypothesis is that during self-motion, the visual system constructs an allocentric ground surface representation (reference frame) by integrating the allocentric intrinsic bias with the external depth cues on the natural ground surface.  Supporting this hypothesis, we recently found that when there is texture cue on the ground, the representation of the ground surface is influenced by the allocentric intrinsic bias (Zhou et al, unpublished results).

      (2) The current findings on the "allocentric" coding scheme raise some intriguing questions as to why such a mechanism would be developed and how it could be beneficial. The finding that the "allocentric" coding scheme results in less accurate object localization and requires attentional resources seems counterintuitive and raises questions about its usefulness. However, this observation presents an opportunity for the manuscript to discuss the potential evolutionary advantages or trade-offs associated with this coding mechanism.

      The revision has discussed these important issues on page 12.

      (3) The manuscript lacks a thorough description of the data analysis process, particularly regarding the fitting of the intrinsic bias curve (e.g., the blue and gray dashed curve in Figure 3c) and the calculation of the horizontal separation between the curves. It would be beneficial for the authors to provide more detailed information on the specific function and parameters used in the fitting process and the formula used for the separation calculation to ensure the transparency and reproducibility of the study's results.

      The results of the statistical analysis were presented in the supplementary materials.  We had stated in the original manuscript that we fitted the intrinsic bias curve by eye (obtained by drawing the curve to transcribe the data points as closely as possible) (page 26).  This is because we do not yet have a formula for the intrinsic bias. A challenge is the measured intrinsic bias in the dark can be affected by multiple factors.  One factor is related to individual differences as the intrinsic bias is shaped by the observer’s past experiences and their eye height relative to the ground surface.  However, it is certainly our goal to develop a quantitative model of the intrinsic bias in the future.

      We thank the reviewer for the helpful comments.

      Reviewer 3:

      (1) I am a bit confused by Figure 2b. Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer. In Figure 2, however, the authors assumed that the perceived target was located on the interception between the intrinsic bias curve and the viewing line from the NEW eye position to the target. This suggests that the perceived object depends on the observer's new location, which seems odd with the allocentric coordinate hypothesis.

      We respectively disagree with the Reviewer’s statement that “Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer.”  The statement conflates the definitions of allocentric representation with exocentric representation.  We respectfully maintain that the observer’s body location, as well as observer-object distance, can be represented with the allocentric coordinate system.

      (2) According to Fig 2b, the perceived size should be left-shifted and lifted up in the walking condition compared to that in the stationary condition. However, in Figure 3C and Fig 4, the perceived size was the same height as that in the baseline condition.

      We assume by “target size”, the Reviewer actually meant, “target location”.  It is correct that figure 3c and figure 4 showed judged distance changed as predicted, while the change in judged height was not significant.  One explanation for this is that the magnitude of the height change was much smaller than the distance change and could not be revealed by our blind walking-gesturing method.  Please also note our figures used difference scales for the vertical height and horizontal distance.

      (3) Is the left-shifted perceived distance possibly reflecting a kind of compensation mechanism?  Participants could not see the target's location but knew they had moved forward.  Therefore, their brain automatically compensates for this self-movement when judging the location of a target.  This would perfectly predict the left-shifted but not upward-shifted data in Fig 3C.  A similar compensation mechanism exists for size constancy in which we tend to compensate for distance in computing object size.

      We assume the Reviewer suggested that the path-integration mechanism first estimates the traveled distance in the dark, and then the brain subtracts the estimated distance from the perceived target distance.  We respectfully maintain that this explanation is unlikely because it does not account for our empirical findings.  We found that walking in the dark did not uniformly affect perceived target distance, as the Reviewer’s explanation would predict.  As shown in figures 3 and 4, walking affected the near targets less than the far targets (i.e., the horizontal distance difference between walking and baseline-stationary conditions was smaller for the near target than far target).

      (4) According to Fig 2a, the target, perceived target, and eye should be aligned in one straight line. This means that connecting the physical targets and the corresponding perceived target results in straight lines that converge at the eye position. This seems, however, unlikely in Figure 3c.

      We have added in the revision, the averaged eye positions on the y-axes of figures 3 and 4.  To reveal the impact of the judged angular declination, we also added graphs that plotted the estimated angular declination as a function of the physical declination of the target.  In general, the slopes are close to unity.

      We thank the reviewer for the helpful comments.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) This study is very well-designed and written. One minor comment is that anisotropy usually refers to the perceptual differences along cardinal (horizontal + vertical) and oblique directions. It might be clearer if the authors changed the "horizontal-vertical anisotropy" to "horizontal/vertical asymmetry”.

      The Reviewer is correct, and we have changed it to horizontal/vertical asymmetry (pages 8 and 11).

      Reviewer 2 (Recommendations For The Authors):

      (1) Providing more details about the "path integration mechanism" when it is first introduced in line 44 would be helpful for readers to better understand the concept.

      The revision has expanded on the path integration mechanism (page 4).

      Adding references for the statement starting with "In fact, previous findings" in lines 218 and would be helpful to provide readers with a basis for comparison between the current study and previous studies that reported an egocentric coding system.

      We have added the references and elaborated on this important issue (pages 10-11).

      (2) There appears to be a discrepancy between the Materials and Methods section, which states that 14 observers participated in Experiments 1-4, and the legends of Figures 3 and 4, which indicates a sample size of "n=8." It would be helpful if the authors could clarify this discrepancy and provide an explanation for the difference in the sample size reported.

      We have clarified the number of observers on page 14.

      (3) While reporting statistical significance is essential in the Results section, there are several instances where the manuscript only mentions a "statistically significant separation" with it p-value without providing the mean and standard deviation of the separation values (e.g., line 100 and 120). This can make it difficult for readers to fully grasp the quantitative nature of the results.

      The statistical analysis and outcomes were presented in the supplementary information document in our original submission.

      Reviewer 3 (Recommendations For The Authors):

      (1) Figure 1 is not significantly related to the current manuscript.

      We feel that retaining figure 1 in the manuscript would help readers to quickly grasp the background literature without having to refer extensively to our previous publications.

      (2) Add eye position to the results figures.

      We have added eye positions in the figures.

      (3) Fig 4c requires a more detailed explanation. The authors stated that Figures 4a and 4c showed consistent results.  However, because 4a and 4c used different horizontal axis, it is different to compare them directly.

      We have modified the sentence in the revision (page 8).

    1. Reviewer #2 (Public Review):

      Summary:

      The goal of this study is to clarify how the brain simultaneously represents item-specific temporal information and item-independent boundary information. The authors report spectral EEG data from intracranial patients performing a delayed free recall task. They perform cosine similarity analyses on principal components derived from gamma band power across stimulus duration. The authors find that similarity between items in serial position 1 (SP1) and all other within-list items decreases as a function of serial position, consistent with temporal context models. The authors find that across-list item similarity to SP1 is greatest for SP1 items relative to items from other serial positions, an effect that is greater in medial parietal lobe compared to lateral temporal cortex and hippocampus. The authors conclude that their findings suggest that perceptual boundary information is represented in medial parietal lobe. Despite a robust dataset, the methodological limitations of the study design prevent strong interpretations from being made from these data. The same-serial position across-list similarity may be driven by attentional mechanisms that are distinct from boundary information.

      Strengths:

      (1) The motivation of the study is strong as how both temporal contextual drift and event boundaries contribute to memory mechanisms is an important open question.

      (2) The dataset of spectral EEG data from 99 intracranial patients provides the opportunity for precise spatiotemporal investigation of neural memory mechanisms.

      Weaknesses:

      The goal of reconciling temporal context and event boundary mechanisms is timely and would be of interest; however, an attentional account can still be used to explain the findings. This alternative account is not considered in the manuscript.

      (1) The issue related to interpreting the SP1 similarity effects as reflecting boundary specific representations remains in the revised manuscript. The authors suggest that because cross-list SP1 similarity is found in recalled items that this supports the boundary interpretation. However, the effects could still be explained by variability in attention that is not specific to an event-boundary per se. As both subsequently recalled items and primacy items tend to recruit more gamma power than non-recalled and non-primacy items, recalled items will tend to have greater similarity with one another. It does not necessarily follow though that that this similarity is due to a "boundary representation."

      (2) The authors partly addressed my concern regarding the comparison of recalled pairs. How did the authors account for the fact that the same participants do not contribute equally to all ROIs? If only participants who have electrodes in all ROIs are included, are the effects consistent?

    2. eLife assessment

      This valuable study presents a novel analysis of a large human intracranial electrophysiological recording dataset. The study challenges the traditional view that neural responses to word lists exhibit smoothly drifting contexts over time, showing that items just after a boundary have a characteristic response that occurs repeatedly. The evidence is incomplete, however, leaving open the possibility for alternative explanations.

    3. Reviewer #1 (Public Review):

      Summary:

      This study applied pattern similarity analyses to intracranial EEG recordings to determine how neural drift is related to memory performance in a free recall task. The authors compared neural similarity within and across lists, in order to contrast signals related to contextual drift vs. the onset of event boundaries. They find that within-list neural differentiation in the lateral temporal cortex correlates with probability of word recall; in contrast, across-list pattern similarity in the medial parietal lobe correlates with recall for items near event boundaries (early-list serial positions). This primacy effect persists for the first three items of a list. Medial parietal similarity is also enhanced across lists for end-of-list items, however this effect then predicts forgetting. The authors do not find that within- or across-list pattern similarity in the hippocampus is related to recall probability.

      Strengths:

      The authors use a large dataset of human intracranial electrophysiological recordings, which gives them high statistical power to compare neural activity and memory across three important memory encoding regions. In so doing, the authors seek to address a timely and important question about the neural mechanisms that underlie the formation of memories for events.

      The use of both within and across event pattern similarity analyses, combined with linear mixed effects modeling, is a marriage of techniques that is novel and translatable in principle to other types of data.

      Weaknesses:

      In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth in order to reconcile it with the previous literature and with the motivating theoretical model.

      The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). In other words, this is analogous to presenting the same item at the start of every single list, in which case it is not surprising that the parietal (or any neural) representation would be similar to itself at the start of every list. So, a qualitatively unique boundary representation would not be necessary to explain this result. The authors do not include analyses to rule this out, which makes it difficult to interpret a key finding.

      There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors' interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances.

      The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue.

    4. Reviewer #3 (Public Review):

      Summary:

      In this study, the authors analyzed data from 99 individuals with implanted electrodes who were performing a word-list recall task. Because the task involves successively encoding and then recalling 25 lists in a row, they were able to measure the similarity in neural responses for items within the same list as well as items across different lists, allowing them to test hypotheses about the impact of between-list boundaries on neural responses. They find that, in addition to slow drift in responses across items within a list and changes across lists, there is boundary-related structure in the medial parietal lobe such that early items in each list show similarity (for recalled items) and late items in each list show similarity (for not recalled items).

      Strengths:

      The dataset used in this paper is substantially larger than most iEEG datasets, allowing for the detection of nuanced differences between item positions and for analyses of individual differences in boundary-related responses. There are excellent visualizations of the similarity structure between items for each region, and this work connects to a growing literature on the role of event boundaries in structuring neural responses.

      Weaknesses:

      (1) The visualization in Fig 1B claims that the prediction of the temporal context model is that nearby items in the presented sequence should have similar representations; that is, nearby items within a list should be similar, and the end of a list should look similar to the beginning of the next list. First, it's unclear to me if this is exactly what TCM would predict for this dataset, since lists are separated by ~60 seconds of distractor and retrieval tasks, rather than simply by a brief event boundary. Second, the authors do not actually test this model of continuous similarity across lists. After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with a "list distance" regressor that predicts discrete changes between lists. The authors state that it is not possible to replace this list distance regressor with an item distance regressor (which would be a straight line in Fig 3D rather than stair-steps) because this would be too collinear with the boundary proximity regressor, but I do not understand why these regressors would be collinear at all (since the boundary proximity regressor does not systematically increase or decrease across items).

      (2) There is no theoretical or quantitative justification for the specific forms of the boundary proximity models, For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different linear model of d/#items is used, which seems to have a somewhat different interpretation, since it changes at a constant rate across all items rather than only modeling items near the final boundary. Confusingly, the schematic in Fig 1B shows symmetric effects at initial and final boundaries, despite two different models being used and the authors' assertion in their response that they do not believe these processes are symmetric.

      (3) It is unclear to me whether the authors believe that the observed similarity after boundaries is due to an active process in which "the medial parietal lobe uses drift-resets" to reinstate a boundary-related context, or that this similarity is simply because "the context for the first item may be the boundary itself", and therefore this effect would emerge naturally from a temporal context model that incorporates the full task structure as the "items."

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. Reviewer #3 (Public Review):

      Summary:

      The authors food-deprived male and female mice and observed a much stronger reduction of leptin levels, energy consumption in the visual cortex, and visual coding performance in males than females. This indicates a sex-specific strategy for the regulation of the energy budget in the face of low food availability.

      Strengths:

      This study extends a previous study demonstrating the effect of food deprivation on visual processing in males, by providing a set of clear experimental results, demonstrating the sex-specific difference. It also provides hypotheses about the strategy used by females to reduce energy budget based on the literature.

      Weaknesses:

      The authors do not provide evidence that females are not impacted by visually guided behaviors contrary to what was shown in males in the previous study.

    2. Reviewer #1 (Public Review):

      Padamsey et al. followed up on their previous study in which they found that male mice sacrifice visual cortex computation precision to save energy in periods of food restriction (Padamsey et al. 2021, Neuron). In the present study, the authors find that female mice show much lower levels of adaptation in response to food restriction on the level of metabolic signaling and visual cortex computation. This is an important finding for understanding sex differences in adaptation to food scarcity and also impacts the interpretation of studies employing food restriction in behavioral analyses and learning paradigms.

      Strengths:

      The manuscript is, in general, very clear and the conclusions are straightforward. The experiments are performed in the same conditions for males and females and the authors did not find differences in the behavioral states of male and female mice that could explain differences in energy consumption. Moreover, they show that visual cortex in both males and females does not change its baseline energy consumption in the dark, therefore the adjustment of energy budget in males only targets visual processing.

      Weaknesses:

      The number of experiments is insufficient to compare the effects of food restriction in males and females directly, which is discussed by the authors: to address this point they use Bayes factor analysis to provide an estimate of the likelihood that females and males indeed differ in terms of energy metabolism and sensory processing adaptions during food restriction.

    3. Reviewer #2 (Public Review):

      Summary:

      Padamsey et al build up on previous significant work from the same group which demonstrated robust changes in the visual cortex in male mice from long-term (2-3 weeks) food restriction. Here, the authors extend this finding and reveal striking sex-specific differences in the way the brain responds to food restriction. The measures included the whole-body measure of serum leptin levels, and V1-specific measures of activity of key molecular players (AMPK and PPARα), gene expression patterns, ATP usage in V1, and the sharpness of visual stimulus encoding (orientation tuning). All measures supported the conclusion that the female mouse brain (unlike in males) does not change its energy usage and cortical functional properties on comparable food restriction.

      While the effect of food restriction on more peripheral tissue such as muscle and bones has been well studied, this result contributes to our understanding of how the brain responds to food restriction. This result is particularly significant given that the brain consumes a large fraction of the body's energy consumption (20%), with the cortex accounting for half of that amount. The sex-specific differences found here are also relevant for studies using food restriction to investigate cortical function.

      Strengths:

      The study uses a wide range of approaches mentioned above which converge on the same conclusion, strengthening the core claim of the study.

      Weaknesses:

      Since the absence of a significant effect does not prove the absence of any changes, the study cannot claim that the female mouse brain does not change in response to food restriction. However, the authors do not make this claim. Instead, they make the well-supported claim that there is a sex-specific difference in the response of V1 to food restriction.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) For a number of experiments the authors use their new data set on females and compare that with the data set previously published on males. In how far are these data sets comparable? Have they been performed originally in parallel for example using siblings of different sexes or have the experiments been conducted several years apart from each other? What is the expected variability, if one repeated these experiments with the same sex considering the differences/similarities between experimental setups, housing conditions, interindividual differences, etc.? 

      This is an important point. We did our best to collect the data in similar conditions (same set-ups; same animal housing conditions) and in experimental cohorts including both males and females. While some data from males were published first, the acquisition of male and female data was done in the same time period.

      Specifically, all results shown in Figure 1 and Figure 2 (Serum leptin, PPARalpha, AMPK, RNAseq) come from samples (from both males and females) that were processed at the same time and in similar conditions, by the same authors (Z.P. and P. M.).

      For the in vivo data (Figure 3, Supplementary figure 1), the male and female data were collected within a 1–2-year timeframe, in the same setups, by the same two authors (Z.P., D.K.). The males and females were housed under similar conditions (same room, same cage type, in groups of 25). We did not use siblings of different sexes. Independent cohorts (1-12 months apart), including both males and females, went into each data set. The within cohort variability does not obviously differ from between cohort variability, however the n number of animals is too small to confirm this with sufficient statistical power. 

      Altogether, the differences observed between male and female data cannot be explained by the timing and conditions of data acquisition from both sexes.

      (2) Energy consumption and visual processing may differ between periods in which animals are in different behavioral states. Is there a possibility that male and female mice differed in behavioral state during measurements? Were animals running or resting during visual stimulation and during ATP measurements? 

      We thank the reviewer for this suggestion. We have now edited the text and included a new supplementary figure. All in vivo experiments were done in stationary animals that were resting in a cardboard tube both during 2-photon imaging and ATP measurements. Animals were also well habituated to the setup. In addition, we have imaged pupil diameters during in vivo imaging session. We have quantified pupil diameter during visual stimulation and do not find a sex difference (Supplemental Figure 2). Thus, we did not find a significant difference in behavioural or attentional state between sexes, in our experimental conditions.

      We have edited the text to include this information (lines 183-185).

      (3) Related to the previous point: the authors show that ATP consumption was reduced in male mice during visual stimulation. What about visual cortex ATP consumption in the absence of visual stimulation? Do food-deprived males and/or females show lower ATP consumption in the visual cortex e.g. during sleep? 

      We have repeated V1 ATP imaging experiments in the dark, in the absence of visual stimulation, in both males and females (Supplementary figure 1). ATP consumption rates are slower in the dark vs. during visual stimulation. Moreover, we find that in the dark, there is no difference in ATP consumption rate between control and food restricted animals of either sex. Thus, the reduced ATP consumption we found with food restriction in males is related specifically to the active processing of visual information.

      We have edited the text to include this information (lines 158-159).

      Reviewer 2:

      (1) It appears that the authors have the data for doing decoding analysis, similar to Fig 6D in their previous paper. However, this analysis has not been done for this study. This would be good to include.  If the authors have attempted the behavioural discrimination tests on female mice as in the previous study, this would also be useful to include. 

      The first point of the reviewer is about datasets acquired in males that are included in our previous publication (Padamsey et al., 2022) but not compared to female data in the present manuscript.

      Whilst we fully agree that these results would be very useful, we did not have the resources (in terms of skilled researcher and funding) to perform these experiments in female mice. That is why these results are not included in this manuscript.

      (2) There appears to be an inconsistency in the methods of reporting OSI. It states that the OSI of grating-responsive neurons was calculated as 1 - circular variance. But then OSI is defined as simply abs(). Also, it would be good to be consistent about reporting medians as the median without confounding with the average (which is the mean). Sentences such as the following do not make sense: The average OSI for an animal was taken as the median OSI value calculated across neurons. This should be corrected throughout the manuscript, where the average is mentioned but the median is measured. 

      We thank the reviewer for noting this issue and we apologize for the confusion. We have now clarified the above in the manuscript (lines 587-603) and insert the following reference for the detailed explanation of OSI and DSI calculation: Mazurek M, Kager M, Van Hooser SD. Robust quantification of orientation selectivity and direction selectivity. Front Neural Circuits. 2014. https://doi.org/10.3389/fncir.2014.00092

      In the figure showing the orientation tuning, the authors have collapsed the two directions of each orientation together. However, if I understand correctly, the calculation of OSI does not do this step of collapsing. In this case, and in the interest of revealing more useful features of the data instead of averaging them out, it would be good to show the average tuning curves with and without FR for all directions, not collapsed. 

      As with orientation tuning, we found that direction tuning is reduced with food restriction, and that this is significant in males, but not in females. These results are now included in the text, with statistics (lines 179-180) and in Supplemental Figure 3.

      Reviewer 3:

      l. 183-187 The discussion based on the idea that "The Bayes factor analysis helps to differentiate the absence of evidence from the evidence of absence." does not seem very helpful. Using a statistical criterium makes less sense than providing the reader with an estimate largest effect size (if there is any) that is compatible with the observation. If there would be a significant effect but of a very small size would it change the authors' conclusion? That seems unlikely. I recommend removing the sentence on line 184, which is in fact not used afterwards. 

      We agree with the reviewer. We have now removed the sentence and rephrased (lines 202-208).  

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We now provide exact p-values alongside the summary statistics (test statistic and df) and 95% confidence intervals for all key results.

    1. Reviewer #2 (Public Review):

      Summary:

      The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.

      Strengths:

      (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.

      (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.

      (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.

      Weaknesses:

      While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.

      The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. For example, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at the neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.

      While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category or by each acoustic feature, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing. This goes back to my first comment that the selected set of stimuli may not fully exploit the entire space of speech and music, and there are possible exemplars that violate the preference map here. For example, this study only considered a specific set of multi-instrumental music, it is not clear to me if other types of music would result in different response profiles in individual channels. It is also not clear if a foreign language that the listeners cannot comprehend would evoke similar response profiles. On the contrary, breaking down into the neural coding of more fundamental feature representations that constitute speech and music, and analyzing the unique contribution of each feature would give a more comprehensive understanding.

      The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.

    2. eLife assessment

      This study presents valuable intracranial findings on how two types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.

    3. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors examined the extent to which processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.

      Strengths:

      The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.

      Weaknesses:

      (1) The study employed longer speech and music stimuli, thereby promising improved ecological validity as compared to prior research, a point emphasized by the authors. However, it failed to differentiate between neural responses to the diverse content or local structures within speech and music. The authors considered the potential limitation of treating these extensive speech and music stimuli as stationary signals, neglecting their complex musical or linguistic structural details and temporal variations across local structures such as sentences and phrases. This balanced perspective offered by the authors aids readers in better understanding the context of the study and highlights potential areas for expansion and further considerations.

      (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music. However, this should not deter readers from recognizing the study's strengths, namely, the use of iEEG recordings that offer high spatial resolution and extensive cortical coverage.

      (3) The concept of selectivity - shared, preferred, and domain-selective - may not present sufficient theoretical accuracy. It is appreciated that the authors put effort into clearly defining their operational measurement on 'selectivity'. Later, the authors further mentioned the specific indication of their analyses. However, the authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with posthoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not present a strong case that a region is specifically selective to a type of stimulus like speech. The narrative of the manuscript could potentially lead to an overgeneralized interpretation of their findings as being broadly applicable to speech or music, if a reader does not delve into the details.

      (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents potential issues. If cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.

    4. Reviewer #3 (Public Review):

      Summary:

      Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain specific or rather domain general and shared. To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.

      Strengths:

      I found this manuscript to be rigorous providing compelling and clear evidence towards shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis on the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.

      Weaknesses:

      I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We have specifically addressed the points of uncertainty highlighted in eLife's editorial assessment, which concerned the lack of low-level acoustics control, limitations of experimental design, and in-depth analysis. Regarding “the lack of low-level acoustics control, limitations of experimental design”, in response to Reviewer #1, we clarify that our study aimed to provide a broad perspective —which includes both auditory and higher-level processes— on the similarities and distinctions in processing natural speech and music within an ecological context. Regarding “the lack of in-depth analysis”, in response to Reviewer #1 and #2, we have clarified that while model-based analyzes are valuable, they pose fundamental challenges when comparing speech and music. Non-acoustic features inherently differ between speech and music (such as phonemes and pitch), making direct comparisons reliant on somewhat arbitrary choices. Our approach mitigates this challenge by analyzing the entire neural signal, thereby avoiding potential pitfalls associated with encoding models of non-comparable features. Finally, we provide some additional analyzes suggested by the Reviewers.

      We sincerely appreciate your thoughtful and thorough consideration throughout the review process.

      eLife assessment

      This study presents valuable intracranial findings on how two important types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid but somewhat incomplete since although the data analysis is thorough, the results are robust and the stimuli have ecological validity, important considerations such as low-level acoustics control, limitations of experimental design, and in-depth analysis, are lacking. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors examined the extent to which the processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.

      Strengths:

      The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.

      Weaknesses:

      The weakness of this study, in my view, lies in its experimental design and reasoning:

      (1) Despite using longer stimuli, the study does not significantly enhance ecological validity compared to previous research. The analyses treat these long speech and music stimuli as stationary signals, overlooking their intricate musical or linguistic structural details and temporal variation across local structures like sentences and phrases. In previous studies, short, less ecological segments of music were used, maintaining consistency in content and structure. However, this study, despite employing longer stimuli, does not distinguish between neural responses to the varied contents or structures within speech and music. Understanding the implications of long-term analyses, such as spectral and connectivity analyses over extended periods of around 10 minutes, becomes challenging when they do not account for the variable, sometimes quasi-periodical or even non-periodical, elements present in natural speech and music. When contrasting this study with prior research and highlighting its advantages, a more balanced perspective would have been beneficial in the manuscript.

      Regarding ecological validity, we respectfully hold a differing perspective from the reviewer. In our view, a one-second music stimulus lacks ecological validity, as real-world music always extends much beyond such a brief duration. While we acknowledge the trade-off in selecting longer stimuli, limiting the diversity of musical styles, we maintain that only long stimuli afford participants an authentic musical listening experience. Conversely, shorter stimuli may lead participants to merely "skip through" musical excerpts rather than engage in genuine listening.

      Regarding the critique that we "did not distinguish between neural responses to the varied contents or structures within speech and music," we partly concur. Our TRF (temporal response function) analyzes incorporate acoustic content, particularly the acoustic envelope, thereby addressing this concern to some extent. However, it is accurate to note that we did not model non-acoustic features. In acknowledging this limitation, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      Finally, we did take into account the reviewer’s remark and did our best to give a more balanced perspective of our approach and previous studies in the discussion.

      “While listening to natural speech and music rests on cognitively relevant neural processes, our analytical approach, extending over a rather long period of time, does not allow to directly isolate specific brain operations. Computational models -which can be as diverse as acoustic (Chi et al., 2005), cognitive (Giordano et al., 2021), information-theoretic (Di Liberto et al., 2020), or self-supervised neural network (Donhauser & Baillet, 2019 ; Millet et al., 2022) models- are hence necessary to further our understanding of the type of computations performed by our reported frequency-specific distributed networks. Moreover, incorporating models accounting for musical and linguistic structure can help us avoid misattributing differences between speech and music driven by unmatched sensitivity factors (e.g., arousal, emotion, or attention) as inherent speech or music selectivity (Mas-Herrero et al., 2013; Nantais & Schellenberg, 1999).”

      (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music.

      We acknowledge the reviewer's concern. Indeed, speech and music differ on various levels, including acoustic and cognitive aspects, and our analyzes do not explicitly distinguish them. The aim of this study was to provide an overview of the similarities and differences between natural speech and music processing, in ecological context. Future work is needed to explore further the different hierarchical levels or networks composing such listening experiences. Of note, however, we report whole-brain results with high spatial resolution (thanks to iEEG recordings), enabling the distinction between auditory, superior temporal gyrus (STG), and higher-level responses. Our findings clearly highlight that both auditory and higher-level regions predominantly exhibit shared responses, challenging the interpretation that our results can be attributed solely to differences in 'basic acoustic characteristics'.

      We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The concept of selectivity - shared, preferred, and domain-selective - increases the risks of potentially overgeneralized interpretations and theoretical inaccuracies. The authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with post hoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not necessarily imply that a region is specifically selective to a type of stimulus like speech. The manuscript's narrative might lead to an overgeneralized interpretation that their findings apply broadly to speech or music. However, identifying differences in neural responses to a few sets of specific stimuli in one brain region does not robustly support such a generalization. This is because speech and music are inherently diverse, and specificity often relates more to the underlying functions than to observed neural responses to a limited number of examples of a stimulus type. See the next point.

      Exactly! Here, we present a precise operational definition of these terms, implemented with clear and rigorous statistical methods. It is important to note that in many cognitive neuroscience studies, the term "selective" is often used without a clear definition. By establishing operational definitions, we identified three distinct categories based on statistical testing of differences from baseline and between conditions. This approach provides a framework for more accurate interpretation of experimental findings, as now better outlined in the introduction:

      “Finally, we suggest that terms should be operationally defined based on statistical tests, which results in a clear distinction between shared, selective, and preferred activity. That is, be A and B two investigated cognitive functions, “shared” would be a neural population that (compared to a baseline) significantly and equally contributes to the processing of both A and B; “selective” would be a neural population that exclusively contributes to the processing of A or B (e.g. significant for A but not B); and “preferred” would be a neural population that significantly contributes to the processing of both A and B, but more prominently for A or B (Figure 1A).”

      Regarding the risk of over-generalization, we want to clarify that our manuscript does not claim that a specific region or frequency band is selective to speech or music. As indeed we focus on testing excerpts of speech and music, we employ the reverse logical reasoning: "if 10 minutes of instrumental music activates a region traditionally associated with speech selectivity, we can conclude that this region is NOT speech-selective." Our conclusions revolve around the absence of selectivity rather than the presence of selective areas or frequency bands. In essence, "one counterexample is enough to disprove a theory." We now further elaborated on this point in the discussion section:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyzes. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents issues. For instance, in the cochlea, different stimuli activate different parts of the basilar membrane due to the distinct spectral contents of speech and music, with each part being selective to certain frequencies. However, this phenomenon reflects the frequency selectivity of the basilar membrane - an important function, not an inherent selectivity for speech or music. Similarly, if cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.

      We completely agree with the last statement, as our primary goal was not to investigate the functional mechanisms underlying speech and music processing. However, the finding of a substantial portion of the cortical network as being shared between the two domains constrains our understanding of the underlying common operations. Regarding the initial part of the comment, we would like to clarify that in the framework we propose, if cortical regions show heightened responses to one type of stimulus over another, this falls into the ‘preferred’ category. The ‘selective’ (exclusive) category, on the other hand, would require that the region be unresponsive to one of the two stimuli.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.

      Strengths:

      (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.

      (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.

      (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.

      Weaknesses:

      While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.

      The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. The selected musical stimuli, incorporating both vocals and multiple instrumental sounds, raise questions about the specificity of neural activation. For instance, it's unclear if the vocal elements in music and speech engage identical neural circuits. Additionally, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at a neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.

      We appreciate the reviewer's acknowledgment that delving into the intricate details of neural coding of speech and music was beyond the scope of this work. To address some of the more precise issues raised, we have clarified in the manuscript that our musical stimuli do not contain vocals and are purely instrumental. We apologize if this was not clear initially.

      “In the main experimental session, patients passively listened to ~10 minutes of storytelling (Gripari, 2004); 577 secs, La sorcière de la rue Mouffetard, (Gripari, 2004) and ~10 minutes of instrumental music (580 secs, Reflejos del Sur, (Oneness, 2006) separated by 3 minutes of rest.”

      Furthermore, we now acknowledge the importance of modeling melodic, phonetic, or linguistic features in the discussion, and we have referenced the work of Sankaran et al. (2024) and McCarty et al. (2023) in this regard. However, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      “These selective responses, not visible in primary cortical regions, seem independent of both low-level acoustic features and higher-order linguistic meaning (Norman-Haignere et al., 2015), and could subtend intermediate representations (Giordano et al., 2023) such as domain-dependent predictions (McCarty et al., 2023; Sankaran et al., 2023).”

      References:

      McCarty, M. J., Murphy, E., Scherschligt, X., Woolnough, O., Morse, C. W., Snyder, K., Mahon, B. Z., & Tandon, N. (2023). Intraoperative cortical localization of music and language reveals signatures of structural complexity in posterior temporal cortex. iScience, 26(7), 107223.

      Sankaran, N., Leonard, M. K., Theunissen, F., & Chang, E. F. (2023). Encoding of melody in the human auditory cortex. bioRxiv. https://doi.org/10.1101/2023.10.17.562771

      The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.

      We appreciate the reviewer's concern, but we do not view this as a weakness for our study's purpose. Every method inherently has limitations, and intracranial recordings currently offer the best possible spatial specificity and temporal resolution for studying the human brain. Studying cell assemblies thoroughly in humans is ethically challenging, and examining speech and music in non-human primates or rats raises questions about cross-species analogy. Therefore, despite its limitations, we believe intracranial recording remains the best option for addressing these questions in humans.

      Regarding the granularity of neural representation, while understanding how computations occur in the central nervous system is crucial, we question whether the single neuron scale provides the most informative insights. The single neuron approach seem more versatile (e.g., in term of cell type or layer affiliation) than the local circuitry they contribute to, which appears to be the brain's building blocks (e.g., like the laminar organization; see Mendoza-Halliday et al.,2024). Additionally, the population dynamics of these functional modules appear crucial for cognition and behavior (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023). Therefore, we emphasize the need for multi-scale research, as we believe that a variety of approaches will complement each other's weaknesses when taken individually. We clarified this in the introduction:

      “This approach rests on the idea that the canonical computations that underlie cognition and behavior are anchored in population dynamics of interacting functional modules (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023) and bound to spectral fingerprints consisting of network- and frequency-specific coherent oscillations (Siegel et al., 2012).”

      Importantly, we focus on the macro-scale and conclude that, at the anatomical region level, no speech or music selectivity can be observed during natural stimulation. This is stated in the discussion, as follow:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyses. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      References :

      Mendoza-Halliday, D., Major, A.J., Lee, N. et al. A ubiquitous spectrolaminar motif of local field potential power across the primate cortex. Nat Neurosci (2024).

      Safaie, M., Chang, J.C., Park, J. et al. Preserved neural dynamics across animals performing similar behaviour. Nature 623, 765–771 (2023).

      Buzsáki, G., & Vöröslakos, M. (2023). Brain rhythms have come of age. Neuron, 111(7), 922-926.

      While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category, or ratio-based statistics, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing.

      To clarify, the metrics we are investigating (coherence, power, linear correlations) are continuous. Additionally, we conduct a comprehensive statistical analysis of these results. The statistical testing, which includes assessing differences from baseline and between the speech and music conditions using a statistical threshold, yields three categories. Of note, ratio-based statistics (a continuous metric) are provided in Figures S9 and S10 (Figures S8 and S9 in the original version of the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain-specific or rather domain-general and shared? To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low-frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of the regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.

      Strengths:

      I found this manuscript to be rigorous providing compelling and clear evidence of shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity, and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches, and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence, and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis of the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.

      Weaknesses:

      I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.

      Thank you for this positive review of our work. We added these points as limitations and future directions in the discussion section:

      “Finally, in adopting here a comparative approach of speech and music – the two main auditory domains of human cognition – we only investigated one type of speech and of music also using a passive listening task. Future work is needed to investigate for instance whether different sentences or melodies activate the same selective frequency-specific distributed networks and to what extent these results are related to the passive listening context compared to a more active and natural context (e.g. conversation).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The concepts of activation and deactivation within the study's context of selectivity are not straightforward to comprehend. It would be beneficial for the authors to provide more detailed explanations of how these phenomena relate to the selectivity of neural responses to speech and music. Such elaboration would aid readers in better understanding the nuances of how certain brain regions are selectively activated or deactivated in response to different auditory stimuli.

      The reviewer is right that the reported results are quite complex to interpret. The concepts of activation and deactivation are generally complex to comprehend as they are in part defined by an approach (e.g., method and/or metric) and the scale of observation (Pfurtscheller et al., 1999). The power (or the magnitude) of time-frequency estimate is by definition a positive value. Deactivation (or desynchronization) is therefore related to the comparison used (e.g., baseline, control, condition). This is further complexified by the scale of the measurement, for instance, when it comes to a simple limb movement, some brain areas in sensory motor cortex are going to be activated, yet this phenomenon is accompanied at a finer scale by some desynchonization of the mu-activity, and such desynchronization is a relative measure (e.g., before/after motor movement). At a broader scale it is not rare to see some form of balance between brain networks, some being ‘inhibited’ to let some others be activated like the default mode network versus sensory-motor networks. In our case, when estimating selective responses, it is the strength of the signal that matters. The type of selectivity is then defined by the sign/direction of the comparison/subtraction. We now provide additional details about the sign of selectivity between domains and frequencies in the Methods and Results section:

      Methods:

      “In order to explore the full range of possible selective, preferred, or shared responses, we considered both responses greater and smaller than the baseline. Indeed, as neural populations can synchronize or desynchronize in response to sensory stimulation, we estimated these categories separately for significant activations and significant deactivations compared to baseline.”

      Results:

      “We classified, for each canonical frequency band, each channel into one of the categories mentioned above, i.e. shared, selective, or preferred (Figure 1A), by examining whether speech and/or music differ from baseline and whether they differ from each other. We also considered both activations and deactivations, compared to baseline, as both index a modulation of neural population activity, and have been linked with cognitive processes (Pfurtscheller & Lopes da Silva, 1999; Proix et al., 2022). However, because our aim was not to interpret specific increase or decrease with respect to the baseline, we here simply consider significant deviations from the baseline. In other words, when estimating selectivity, it is the strength of the response that matters, not its direction (activation, deactivation).”

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      References :

      J.P. Lachaux, J. Jung, N. Mainy, J.C. Dreher, O. Bertrand, M. Baciu, L. Minotti, D. Hoffmann, P. Kahane,Silence Is Golden: Transient Neural Deactivation in the Prefrontal Cortex during Attentive Reading, Cerebral Cortex, Volume 18, Issue 2, February 2008, Pages 443–450

      Pfurtscheller, G., & Da Silva, F. L. (1999). Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical neurophysiology, 110(11), 1842-1857

      (2) The manuscript doesn't easily provide information about the control conditions, yet the conclusion significantly depends on these conditions as a baseline. It would be beneficial if the authors could clarify this information for readers earlier and discuss how their choice of control stimuli influences their conclusions.

      We added information in the Results section about the baseline conditions:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Of note, while the choice of different ‘basic auditory stimuli’ as baseline can change the reported results in regions involved in low-level acoustical analyzes (auditory cortex), it will have no impact on the results observed in higher-level regions, which predominantly also exhibit shared responses. We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The spectral analyses section doesn't clearly explain how the authors performed multiwise correction. The authors' selectivity categorization appears similar to ANOVAs with posthoc tests, implying the need for certain corrections in the p values or categorization. Could the authors clarify this aspect?

      We apologize that this was not in the original version of the manuscript. In the spectral analyzes, the selectivity categorization depended on both (1) the difference effects between the domains and the baseline, and (2) the difference effect between domains. Channels were marked as selective when there was (1) a significant difference between domains and (2) only one domain significantly differed from the baseline. All difference effects were estimated using the paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the build-in tmax method to correct for the multiple comparisons over channels (Nichols & Holmes, 2002; Groppe et al. 2011). We have now more clearly explained how we controlled family-wise error in the Methods section:

      “For each frequency band and channel, the statistical difference between conditions was estimated with paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the tmax method to control the family-wise error rate (Nichols and Holmes 2002; Groppe et al. 2011). In tmax permutation testing, the null distribution is estimated by, for each channel (i.e. each comparison), swapping the condition labels (speech vs music or speech/music vs baseline) between epochs. After each permutation, the most extreme t-scores over channels (tmax) are selected for the null distribution. Finally, the t-scores of the observed data are computed and compared to the simulated tmax distribution, similar as in parametric hypothesis testing. Because with an increased number of comparisons, the chance of obtaining a large tmax (i.e. false discovery) also increases, the test automatically becomes more conservative when making more comparisons, as such correcting for the multiple comparison between channels.”

      References :

      Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., Parkkonen, L., & Hämäläinen, M. S. (2014). MNE software for processing MEG and EEG data. NeuroImage, 86, 446–460.

      Groppe, D. M., Bickel, S., Dykstra, A. R., Wang, X., Mégevand, P., Mercier, M. R., Lado, F. A., Mehta, A. D., & Honey, C. J. (2017). iELVis: An open source MATLAB toolbox for localizing and visualizing human intracranial electrode data. Journal of Neuroscience Methods, 281, 40–48.

      Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.

      Reviewer #2 (Recommendations For The Authors):

      Other suggestions:

      (1) The authors need to provide more details on how the sEEG electrodes were localized and selected. Are all electrodes included or only the ones located in the gray matter? If all electrodes were used, how to localize and label the ones that are outside of gray matter? In Figures 1C & 1D it seems that a lot of the electrodes were located in depth locations, how were the anatomical labels assigned for these electrodes

      We apologize that this was not clear in the original version of the manuscript. Our electrode localization procedure was based on several steps described in detail in Mercier et al., 2022. Once electrodes were localized in a post-implant CT-scan and the coordinates projected onto the pre-implant MRI, we were able to obtain the necessary information regarding brain tissues and anatomical region. That is, first, the segmentation of the pre-impant MRI with SPM12 provided both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (csf) probabilities) and the indexed-binary representations (i.e., either gray, white, csf, bone, or soft tissues) that allowed us to dismiss electrodes outside of the brain and select those in the gray matter. Second, the individual's brain was co-registered to a template brain, which allowed us to back project atlas parcels onto individual’s brain and assign anatomical labels to each electrode. The result of this procedure allowed us to group channels by anatomical parcels as defined by the Brainnetome atlas (Figure 1D), which informed the analyses presented in section Population Prevalence (Methods, Figures 4, 9-10, S4-5). Because this study relies on stereotactic EEG, and not Electro-Cortico-Graphy, recording sites include both gyri and sulci, while depth structures were not retained.

      We have now updated the “General preprocessing related to electrodes localisation” section in the Methods. The relevant part now states:

      “To precisely localize the channels, a procedure similar to the one used in the iELVis toolbox and in the fieldtrip toolbox was applied (Groppe et al., 2017; Stolk et al., 2018). First, we manually identified the location of each channel centroid on the post-implant CT scan using the Gardel software (Medina Villalon et al., 2018). Second, we performed volumetric segmentation and cortical reconstruction on the pre-implant MRI with the Freesurfer image analysis suite (documented and freely available for download online http://surfer.nmr.mgh.harvard.edu/). This segmentation of the pre-implant MRI with SPM12 provides us with both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (CSF) probabilities) and the indexed-binary representations (i.e., either gray, white, CSF, bone, or soft tissues). This information allowed us to reject electrodes not located in the brain. Third, the post-implant CT scan was coregistered to the pre-implant MRI via a rigid affine transformation and the pre-implant MRI was registered to MNI152 space, via a linear and a non-linear transformation from SPM12 methods (Penny et al., 2011), through the FieldTrip toolbox (Oostenveld et al., 2011). Fourth, applying the corresponding transformations, we mapped channel locations to the pre-implant MRI brain that was labeled using the volume-based Human Brainnetome Atlas (Fan et al., 2016).”

      Reference:

      Mercier, M. R., Dubarry, A.-S., Tadel, F., Avanzini, P., Axmacher, N., Cellier, D., Vecchio, M. D., Hamilton, L. S., Hermes, D., Kahana, M. J., Knight, R. T., Llorens, A., Megevand, P., Melloni, L., Miller, K. J., Piai, V., Puce, A., Ramsey, N. F., Schwiedrzik, C. M., … Oostenveld, R. (2022). Advances in human intracranial electroencephalography research, guidelines and good practices. NeuroImage, 260, 119438.

      (2) From Figures 5 and 6 (and also S4, S5), is it true that aside from the shared response, lower frequency bands show more music selectivity (blue dots), while higher frequency bands show more speech selectivity (red dots)? I am curious how the authors interpret this.

      The reviewer is right in noticing the asymmetric selective response to music and speech in lower and higher frequency bands. However, while this effect is apparent in the analyzes wherein we inspected stronger synchronization (activation) compared to baseline (Figures 2 and S1), the pattern appears to reverse when examining deactivation compared to baseline (Figures 3 and S2). In other words, there seems to be an overall stronger deactivation for speech in the lower frequency bands and a relatively stronger deactivation for music in the higher frequency bands.

      We now provide additional details about the sign of selectivity between domains and frequencies in the Results section:

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      Note, however, that this pattern of results depends on only a select number of patients, i.e. when ignoring regional selective responses that are driven by as few as 2 to 4 patients, the pattern disappears (Figures 5-6). More precisely, ignoring regions explored by a small number of patients almost completely clears the selective responses for both speech and music. For this reason, we do not feel confident interpreting the possible asymmetry in low vs high frequency bands differently encoding (activation or deactivation) speech and music.

      Minor:

      (1) P9 L234: Why only consider whether these channels were unresponsive to the other domain in the other frequency bands? What about the responsiveness to the target domain?

      We thank the reviewer for their interesting suggestion. The primary objective of the cross-frequency analyzes was to determine whether domain-selective channels for a given frequency band remain unresponsive (i.e. exclusive) to the other domain across frequency bands, or whether the observed selectivity is confined to specific frequency ranges (i.e.frequency-specific). In other words, does a given channel exclusively respond to one domain and never—in whichever frequency band—to the other domain? The idea behind this question is that, for a channel to be selectively involved in the encoding of one domain, it does not necessarily need to be sensitive to all timescales underlying that domain as long as it remains unresponsive to any timescale in the other domain. However, if the channel is sensitive to information that unfolds slowly in one domain and faster in the other domain, then the channel is no longer globally domain selective, but the selectivity is frequency-specific to each domain.

      The proposed analyzes answer a slightly different, albeit also meaningful, question: how many frequencies (or frequency bands) do selective responses span? From the results presented below, the reviewer can appreciate the overall steep decline in selective response beyond the single frequency band with only few channels remaining selectively responsive across maximally four frequency bands. That is, selective responses globally span one frequency band.

      Author response image 1.

      Cross-frequency channel selective responses. The top figure shows the results for the spectral analyzes (baselined against the tones condition, including both activation and deactivation). The bottom figure shows the results for the connectivity analyzes. For each plot, the first (leftmost) value corresponds to the percentage (%) of channels displaying a selective response in a specific frequency band. In the next value, we remove the channels that no longer respond selectively to the target domain for the following frequency band. The black dots at the bottom of the graph indicate which frequency bands were successively included in the analysis.

      (2) P21 L623: "Population prevalence." The subsection title should be in bold.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      The authors chose to use pure tone and syllables as baseline, I wonder if they also tried the rest period between tasks and if they could comment on how it differed and why they chose pure tones, (above and beyond a more active auditory baseline).

      This is an interesting suggestion. The reason for not using the baseline between speech and music listening (or right after) is that it will be strongly influenced by the previous stimulus. Indeed, after listening to the story it is likely that patients keep thinking about the story for a while. Similarly after listening to some music, the music remains in “our head” for some time.

      This is why we did not use rest but other auditory stimulation paradigms. Concerning the choice of pure tones and syllables, these happen to be used for clinical purposes to assess functioning of auditory regions. They also corresponded to a passive listening paradigm, simply with more basic auditory stimuli. We clarified this in the Results section:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Discussion - you might want to address phase information in contrast to power. Your encoding models map onto low-frequency (bandpassed) activity which includes power and phase. However, the high-frequency model includes only power. The model comparison is not completely fair and may drive part of the effects in Figure 7a. I would recommend discussing this, or alternatively ruling out the effect with modeling power separately for the low frequency.

      We thank the reviewer for their recommendation. First, we would like to emphasize that the chosen signal extraction techniques that we used are those most frequently reported in previous papers (e.g. Ding et al., 2012; Di Liberto et al., 2015; Mesgarani and Chang, 2012).

      Low-frequency (LF) phase and high-frequency (HFa) amplitude are also known to track acoustic rhythms in the speech signal in a joint manner (Zion-Golumbic et al., 2013; Ding et al., 2016). This is possibly due to the fact that HFa amplitude and LF phase dynamics have a somewhat similar temporal structure (see Lakatos et al., 2005 ; Canolty and Knight, 2010).

      Still, the reviewer is correct in pointing out the somewhat unfair model comparison and we appreciate the suggestion to rule out a potential confound. We now report in Supplementary Figure S8, a model comparison for LF amplitude vs. HFa amplitude to complement the findings displayed in Figure 7A. Overall, the reviewer can appreciate that using LF amplitude or phase does not change the results: LF (amplitude or phase) always better captures acoustic features than HFa amplitude.

      Author response image 2.

      TRF model comparison of low-frequency (LF) amplitude and high-frequency (HFa) amplitude. Models were investigated to quantify the encoding of the instantaneous envelope and the discrete acoustic onset edges (peakRate) by either the low frequency (LF) amplitude or the high frequency (HFa) amplitude. The ‘peakRate & LF amplitude’ model significantly captures the largest proportion of channels, and is, therefore, considered the winning model. Same conventions as in Figure 7A.

      References:

      Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends in Cognitive Sciences, 14(11), 506–515.

      Di Liberto, G. M., O’sullivan, J. A., & Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reflects phoneme-level processing. Current Biology, 25(19), 2457-2465.

      Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859.

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164.

      Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., ... & Schroeder, C. E. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77(5), 980-991.

      Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.

      Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233-236.

      Similarly, the Coherence analysis is affected by both power and phase and is not dissociated. i.e. if the authors wished they could repeat the coherence analysis with phase coherence (normalizing by the amplitude). Alternatively, this issue could be addressed in the discussion above

      We agree with the Reviewer. We have now better clarified our choice in the Methods section:

      “Our rationale to use coherence as functional connectivity metric was three fold. First, coherence analysis considers both magnitude and phase information. While the absence of dissociation can be criticized, signals with higher amplitude and/or SNR lead to better time-frequency estimates (which is not the case with a metric that would focus on phase only and therefore would be more likely to include estimates of various SNR). Second, we choose a metric that allows direct comparison between frequencies. As, at high frequencies phase angle changes more quickly, phase alignment/synchronization is less likely in comparison with lower frequencies. Third, we intend to align to previous work which, for the most part, used the measure of coherence most likely for the reasons explained above.“

    1. eLife assessment

      This important work substantially advances our understanding of episodic memory in individuals with aphantasia, and sheds light on the neural underpinnings of episodic memory and mental imagery. The evidence supporting the conclusions is convincing, including evidence from a well-established interview paradigm complemented with fMRI to assess neural activation during memory recall. The work will be of broad interest to memory researchers and mental imagery researchers alike.

    2. Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight to their spatial navigation abilities (they could have been overconfident or underconfident in their abilities). Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls to use their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

    3. Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in the activity in visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation. However, given that the most important comparisons are between groups of participants, this does not diminish the main conclusions about aphantasia.

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between the hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      We thank the reviewer for highlighting our contributions and suggesting that the relationship between visual imagery and autobiographical memory recall is an exciting future avenue.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' - a question which is ungrammatical, but potentially reflects a typo in the manuscript) could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight into their spatial navigation abilities (they could have been overconfident or underconfident in their abilities).

      Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls in using their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

      The main goal of our study was to examine autobiographical memory recall. Therefore, we used the gold standard Autobiographical Interview, or AI (Levine et al. 2002) and an fMRI paradigm to explore autobiographical memory recall as standardised, precisely, and objectively as possible.

      In addition to these experimentally rigorous tasks, we employed some loosely formulated questions with the intention for people to reflect on how they perceive their own abilities to recall autobiographical memories, navigate spatially, and use their imagination. We agree with the reviewer that these questions are vague and did not have the experimental standard for an investigation into spatial cognition or imagination associated with aphantasia. Nonetheless, we believe that these questions provide important additional insights into what participants think about their own cognitive abilities. In order to set these questions into perspective, we argue in the discussion that spatial cognition and other cognitive functions should be investigated in more depth in individuals with aphantasia in the future.

      As an additional note, all tasks were conducted in German. Thus, we were able to correct the wording of the debriefing question in our revision. We thank the reviewer for bringing this to our attention.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      We much appreciate the acknowledgment of our work into autobiographical memory employing both the autobiographical interview and fMRI. Furthermore, we hope that our work inspires future research in the way the reviewer outlines and in the way we describe in our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in activity in the visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between the visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      We thank the reviewer for highlighting the importance of our findings.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Once again, we thank the reviewer for highlighting the quality of our methods and our results.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation.

      We agree with the reviewer that our control task differs from autobiographical memory in many different ways. In fact, for this first investigation of the neural correlates of autobiographical memory in aphantasia, this is precisely the reason why we chose this mental arithmetic (MA) task. We know from previous studies, that MA is, as much as possible, not dependent on hippocampal memory processes (Addis, et al. 2007, McCormick et al. 2015, 2017, Leelaarporn et al., 2024). The main goal of the current study was to establish whether there are any differences between individuals with aphantasia and controls. In the next investigation, we can now build on these findings to disentangle in more detail what this difference reflects. 

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

      This highly positive conclusion is much appreciated.

      References

      Addis, D. R., Wong, A. T., & Schacter, D. L. (2007). Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration. Neuropsychologia45(7), 1363-1377.

      Kriegeskorte, N., Simmons, W., Bellgowan, P. et al. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci 12, 535–540 (2009). https://doi.org/10.1038/nn.2303

      Leelaarporn, P., Dalton, M. A., Stirnberg, R., Stöcker, T., Spottke, A., Schneider, A., & McCormick, C. (2024). Hippocampal subfields and their neocortical interactions during autobiographical memory. Imaging Neuroscience.

      Levine, B., Svoboda, E., Hay, J. F., Winocur, G., & Moscovitch, M. (2002). Aging and autobiographical memory: dissociating episodic from semantic retrieval. Psychology and aging17(4), 677.

      McCormick, C., St-Laurent, M., Ty, A., Valiante, T. A., & McAndrews, M. P. (2015). Functional and effective hippocampal–neocortical connectivity during construction and elaboration of autobiographical memory retrieval. Cerebral cortex25(5), 1297-1305.

      McCormick, C., Moscovitch, M., Valiante, T. A., Cohn, M., & McAndrews, M. P. (2018). Different neural routes to autobiographical memory recall in healthy people and individuals with left medial temporal lobe epilepsy. Neuropsychologia110, 26-36.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting article that makes a substantial contribution to the field of the study of aphantasia as well as the neural mechanisms of autobiographical memory. I would strongly recommend this manuscript to be accepted (with these minor revisions), as it makes a substantial and well-evidenced contribution to the research, and it opens up many interesting avenues for researchers to explore. I was especially excited to see that the Autobiographical Interview had been paired with an fMRI paradigm, something which this field of research highly benefits from, as there are yet so few fMRI studies into aphantasia. I understand that it is the authors' decision whether to accept or reject any of the revisions I recommend here, but I would like to stress that I encourage accepting the recommended revisions, especially as there are some minor inaccuracies in the manuscript as it currently stands. Finally, I would like to stress that though I am based in the area of cognitive science, am not trained in fMRI imaging techniques, and therefore do not stand in a position where I can comment on the methodology pertaining to this part of the study - I encourage the Editors to seek a second reviewer's opinion on this.

      Thank you for the positive evaluation of our manuscript as well as your comments. We have revised our manuscript according to your important suggestions as further explained below.

      Line 33: "aphantasia prohibits people from experiencing visual imagery". This  characterisation of aphantasia is too strong, especially as the authors use 32 as a cut-off point on the VVIQ, which represents weak and dim imagery. I would recommend using language like 'people with aphantasia have reduced visual imagery abilities', as this more accurately captures the group of people studied. Please revise throughout the manuscript. Please consult Blomkvist and Marks (2023) on this point who have discussed this problem in the aphantasia literature.

      We agree that aphantasics may experience reduced visual imagery abilities. We have revised our wording throughout the manuscript.

      Line 49: The authors conclude that their results 'indicate that visual mental imagery is essential for detail-rich, vivid AM', but this seems to be a bit too strong, for example since AM can be detail-rich with external (rather than internal) detail, and a person could potentially use mnemonic tricks such as keeping a detail-rich diary in order to boost their memory. That visual imagery is 'essential' implies that it is the only way to achieve detail-rich vivid AM, and this does not seem to be supported by the findings. I would recommend rephrasing it as 'visual mental imagery plays an important role in detail-rich, vivid AM' or 'visual mental imagery mediated detail-rich vivid AM'.

      We altered the sentence in Line 49 using one of the recommended phrases:

      ‘Our results indicate that visual mental imagery plays an important role in detail-rich, vivid AM, and that this type of cognitive function is supported by the functional connection between the hippocampus and the visual-perceptual cortex.’

      Line 69: Blomkvist and Marks (2023) have warned against calling aphantasia a 'condition' and this moreover seems to fit with the authors' previous research (Monzel, 2022). Please consider instead calling aphantasia an 'individual difference' in mental imagery abilities.

      Thank you for the suggestion. We have revised our wording throughout the manuscript, avoiding the term ‘condition’.

      Line 72: Add reference for emotional strength which has also been researched (Wicken et al. 2021, https://doi.org/10.1016/j.cortex.2020.11.014).

      We have added the suggested reference in Line 75:

      ‘Indeed, a handful of previous studies report convergent evidence that aphantasics report less sensory AM details than controls (Bainbridge et al., 2021; Dawes et al., 2020, 2022; Milton et al., 2020; Zeman et al., 2020), which may also be less emotional (Monzel et al., 2023; Wicken et al., 2021).’

      72-73: 'absence of voluntary imagery' - too strong as many people with aphantasia report having weak/dim mental imagery on the VVIQ.

      We agree that aphantasics may experience reduced visual imagery. We have revised this notion throughout the manuscript.

      74: Add reference to Bainbridge study which found a difference between recall of object vs spatial memory. This would be relevant here.

      We have added the suggested reference in Line 76:

      ‘Spatial accuracy, on the other hand, was not found to be impaired (Bainbridge et al., 2021).’

      Lines 94-97: The authors mention 'a prominent theory' but it is unclear which theory is referred to here. The article cited by Pearson (2019) does not suggest the possibility that aphantasia is due to altered connectivity between the hippocampus and visual-perceptual cortices. It suggests that aphantasia is due to impairment in the ventral stream, and in fact says that the hippocampus is unlikely to be affected due to spared spatial abilities in people with aphantasia. Specifically, Pearson claims: "Accordingly, memory areas of the brain that process spatial properties, including the hippocampus, may not be the underlying cause of aphantasia." (page 631). The authors further come back to this point in the discussion section (see comment below), saying that the hypothesis attributed to Pearson is supported by their study. I do not disagree with the point that the hypothesis is supported by the data, but it is unclear to me why the hypothesis is attributed to Pearson.

      Thank you for pointing out this inaccuracy. We have edited the text to spell out our entire train of thought (see Lines 96-102):

      ‘A prominent theory posits that because of this hyperactivity, small signals elicited during the construction of mental imagery may not be detected (Pearson, 2019, Keogh et al., 2020). Pearson further speculates that since spatial abilities seem to be spared, the hippocampus may not be the underlying cause of aphantasia. In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Line 97: Blomkvist reference should be 2022 (when first published online).

      The article ‘Aphantasia: In search of a theory’ by Blomkvist was first published on 1st July 2022. However, a correction was added on 13th March 2023. Therefore, we had cited the corrected version in this manuscript. However, we agree that the first publication date should be used and edited the reference accordingly.

      Line 116: 'one aphantasic' could be seen as offensive. I would suggest 'one aphantasic participant'.

      We have altered the paragraph according to your suggestion.

      Line 138: In line with the recommendations put forward by Blomkvist and Marks (2023), I would suggest removing the word 'diagnosed', as this medicalises aphantasia in a way that is not consistent with its not being a kind of mental disorder (Monzel et al., 2022). I would say that aphantasia is instead operationalised as a score between 16-32. However, note that Blomkvist (2022) and Blomkvist and Marks (2023, https://doi.org/10.1016/j.cortex.2023.09.004 ) point out that there is also a lot of inconsistency in this score and how it is used in different studies. In your manuscript, I would recommend removing all wording that indicates that people with aphantasia have no experience of mental imagery, as you have operationalised for a score up to 32 which indicates vague and dim imagery. Describing vague and dim imagery as no imagery/absence of imagery is inconsistent (but common practice in the literature).

      Thank you for your suggestion. We have revised the entire manuscript to eliminate any ambiguous meanings regarding the definition of aphantasia. Moreover, we replaced the word ‘diagnosed’ with ‘identified’ in Line 146.

      Line 153: maybe 'correlated with imagery strength' rather than 'measures imagery strength'?

      We have altered the sentence according to your suggestion in Line 160:

      ‘Previous studies have shown that the binocular rivalry task validly correlated with mental imagery strength.’

      Line 162: "For participants who were younger than 34 years, the middle-age memory was replaced by another early adulthood memory". Is there precedence for this? Please add one sentence to explain/justify for the reader why a memory from this time period was chosen.

      To maintain the homogeneous data set of acquiring five episodic autobiographical memories from five different periods of life per one individual, we asked the participants who were at the time of the interview, younger than 34 years old, to provide another early adulthood memory instead of middle age memory, as they had not reached the age range of middle age. According to Levine et al. (2002), younger adults (age < 34 years old) selected 2 events from the early adulthood period. Hence, all participants provided the last time period with memories from their previous year. We have added an additional explanation in this section in Line 170:

      ‘In order to acquire five AMs in every participant, the middle age memory was replaced by another early adulthood memory for participants who were younger than 34 years old (see Levine et al., 2002). Hence, all participants provided the last time period with memories from their previous year.’

      Line 169: "During the general probe, the interviewer asked the participant encouragingly to promote any additional details." Consider a different word choice, 'promote' sounds odd.

      We have altered the sentence according to your suggestion in Line 180:

      ‘During the general probe, the interviewer asked the participant encouragingly to provide any additional details.’

      Line 196-198: the phrasing of these questions could have biased participants toward reporting it being more difficult. Did the authors control for this possibility in any way? The phrasing ‘How easy is it for you to [x]?’ might also be considered in a future study.

      Thank you for pointing this out. These debriefing questions were thought of as open questions to get people to talk about their experiences. They were not meant as rigorous scientific experiments. Framing it in a positive way is a good idea for future research.

      We have edited the manuscript on Line 394-396:

      ‘The debriefing questions were employed as a way for participants to reflect on their own cognitive abilities. Of note, these were not meant to represent or replace necessary future experiments.’

      Line 197: This question is ungrammatical. Is this a typo, or was this how the question was actually posed? What language was the study conducted in?

      All interviews within this study were conducted in German. Hence, the questions listed in this current manuscript were all translated from German into English. We have added this information in the Materials and Methods section in Line 169 as well as restructured the referred questions from Line 208-210:

      ‘All interviews were conducted in German.’

      (1) Typically, how difficult is it for you to recall autobiographical memories?

      (2) Typically, how difficult is it for you to orient yourself spatially? 

      (3) Typically, how difficult is it for you to use your imagination?’

      Line 211: The authors write that participants were asked to "re-experience the chosen AM and elaborate as many details as possible in their mind's eye" was this the instruction used? I think stating the explicit instruction here would be relevant for the reader. If this is the word choice, it is also interesting as the autobiographical interview does not normally specify to re-experience details 'in one's mind's eye'.

      The instructions gi‘en to ’he par’Icipa’ts were to choose an AM and re-experience/elaborate it in their mind with as many details as possible without explaining them out loud. We have clarified this in Lines 221-223.

      ‘For the rest of the trial duration, participants were asked to re-experience the chosen AM and try to recall as many details as possible without speaking out loud.’

      Line 213: Were ‘vivid’ and ‘faint’ the only two options? Why was a 5-point scale (like the VVIQ scale) not used to better be able to compare?

      During the scanning session, the participants were given a button box which contained two buttons with 'vivid' by pressing the index finger and 'faint' by pressing the middle finger. The 5-point scale was not used to avoid confusion with the buttons during the scanning session. We have clarified this in Line 224:

      ‘We chose a simple two-button response in order to keep the task as easy as possible.’

      Line 347: Do the authors mean the same thing by 'imagery strength' and 'imagery vividness'? This would be good to clarify as it is not clear that these words mean the same thing.

      Imagery strength is often used to describe the results of the Binocular Rivalry Task, whereas vividness of mental imagery is often used to describe the results of the VVIQ. Although both tasks are correlated, the VVIQ measures vividness, whereas the dimension of the Binocular Rivalry Task is not clearly defined. We added this information in a footnote on page 10.

      Lines 353 - 356: When the authors first say that aphantasics described fewer memory details than controls, does this refer to external + internal details? Please clarify.

      Lines 353-360: The authors first say that aphantasics report "internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94)" (line 355). But then they say: "a 2-way interaction was found between the type of memory details and group, F(1, 27)= 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b)" (line 358). This seems to first say that aphantasics didn't report fewer details than controls, but then that they did report fewer internal details than controls. Please clarify if this is correct.

      Line 383: Results from controls are not reported in this section.

      We have first reported the main effects of the different factors; thus, aphantasics reported less details than controls (no matter of group and type of memory details), the internal details were reported more often than external details (no matter of group and memory period), and more details were reported for recent than remote memories (no matter of group and type of memory details). Subsequently, we report the simple effects for aphantasics and controls separately. To further clarify, we added the following segment in line 360:

      ‘Regarding the AI, we found significant main effects of memory period, F(1, 27) = 11.88, p = .002, ηp2 = .31, type of memory details, F(1, 27) = 189.03, p < .001, ηp2 = .88, and group, F(1, 27) = 9.98, p = .004, ηp2 = .27. When the other conditions were collapsed, aphantasics (M = 26.29, SD = 9.58) described less memory details than controls (M = 38.36, SD = 10.99). For aphantasics and controls combined, more details were reported for recent (M = 35.17, SD = 14.19) than remote memories (M = 29.06, SD = 11.12), and internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94). More importantly, a 2-way interaction was found between type of memory details and group, F(1, 27) = 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b).’

      Overall, the results were reported for aphantasics and controls separately in Lines 368-372.

      Line 386: The question does not specify that it's asking about using imagination in daily life, even though this is what results report. I'm not sure that the question implies the use of imagination in daily life, so I would recommend removing this reference here.

      We have removed the “in daily life” since this was not part of the original debriefing question.

      Line 394: Could this slowness in response reflect uncertainty about the vividness?

      Since the reason for this slowness is not known, we have refrained from adding this to the discussion. However, we added this as a short insertion in line 406:

      ‘Moreover, aphantasics responded slower (M = 1.34 s, SD = 0.38 s) than controls (M = 1.00 s, SD = 0.29 s) when they were asked whether their retrieved memories were vivid or faint, t(28) = 2.78, p = .009, possibly reflecting uncertainty in their response.’

      Line 443: Graph E, significance not indicated on the graph.

      After preprocessing, the fMRI data were statistically analyzed using the GLM contrast AM versus MA. The resulting images were then thresholded at p < 0.001, so that the illuminated voxels in Fig. 3 A, B, C, and D show only voxel in which we know already that there is a statistical difference between our conditions. Graph E illustrates only the descriptive means and variance of the significant differences in Fig. 3 C and D. This display is useful since the reader can more easily assess the difference between two conditions and two groups at a glance. For a general discussion on this topic, please also see circular analysis in fMRI (Kriegeskorte et al. 2009)

      Line 521-522: The authors claim that Pearson (2019) forwards the hypothesis that heightened activity of visual-perceptual cortices hinders aphantasics from detecting small imagery-related signals. However, I find no statement of this hypothesis in Pearson (2019). It is unclear to me why this hypothesis is attributed to Pearson (2019). Please remove this reference or provide a correct citation for where the hypothesis is stated. Further, it is not clear from what is written how the results support this hypothesis as this is rather brief - please elaborate on this.

      We attributed this hypothesis to Pearson (2019) according to his Fig. 4, which states: ‘A strong top-down signal and low noise (bottom left) gives the strongest mental image (square), whereas a high level of neural noise and a weak top-down imagery signal would produce the weakest imagery experience (top right).’

      We have edited our manuscript to reflect Pearson better in Lines 543-550:

      ‘In a prominent review, Pearson synthesizes evidence about the neural mechanism of imagery strength (Pearson, 2019). Indeed, activity metrics in the visual cortex predict imagery strength (Cui et al., 2007; Dijkstra et al., 2017). Interestingly, lower resting activity and excitability result in stronger imagery, and reducing cortical activity in the visual cortex via transcranial direct current stimulation (tDCS) increases visual imagery strength (Keogh et al., 2020). Thus, one potential mechanism of aphantasia-related AM deficits is that the heightened activity of the visual-perceptual cortices observed in our and previous work hinders aphantasics to detect weaker imagery-related signals.’

      Line 575: Consider citing Blomkvist (2022) who has argued that aphantasia is an episodic memory condition

      We added the suggested reference in Line 601.

      Line 585: Consider citing Bainbridge et al (2021) https://doi.org/10.1016/j.cortex.2020.11.014

      We have added the suggested reference in Line 612.

      Line 581: It might be relevant here to also discuss non-visual details, which have indeed been investigated in your present study. E.g. the lower emotional details, temporal details, place details, etc.

      We have edited our discussion to reflect the non-visual details better in Line 605:

      ‘In fact, previous and the current study show that aphantasics and individuals with hippocampal damage report less internal details across several memory detail subcategories, such as emotional details and temporal details (Rosenbaum et al., 2008; St-Laurent et al., 2009; Steinvorth et al., 2005), and these deficits can be observed regardless of the recency of the memory (Miller et al., 2020). These similarities suggest that aphantasics are not merely missing the visual-perceptual details to specific AM, but they have a profound deficit associated with the retrieval of AM.’

      Place details are discussed on page 37 onwards.

      Line 605: I agree with this interesting suggestion for future research. It would also be relevant to reference Bainbridge (2021) here who tested spatial cognition in a drawing task and found that aphantasic participants correctly recalled spatial layouts of rooms but reported fewer objects than controls. It might also be worth pointing out that the present study does not actually test for accuracy in spatial cognition, so it could be the case that people with aphantasia feel confident that they can navigate well, but they might in fact not. Future studies relying on objective measures should test this possibility.

      We have added the suggested reference in Line 625.

      Lines 609-614: Is there any evidence that complex decision-making and complex empathy tasks depend on constructed scenes with visual-perceptual details? This hypothesis seems a bit far-fetched without any supporting evidence. In fact, it seems unlikely to be supported as we also know that people with aphantasia generally live normal lives, and often have careers that we can assume involve complex decision-making (see Zeman 2020 who report aphantasics who work as computer scientists, managers, etc). I would recommend that the authors provide evidence of the role of mental imagery in complex decision-making and complex empathy tasks, mediated by scene construction, to support this hypothesis as viable to test for future research. It is also unclear how this point connects to the argument made by Bergmann and Ortiz-Tudela (2023). In fact, Bergmann and Ortiz-Tudela seem to make the same argument as Pearson (2019) does - that aphantasia results from impairments in the ventral stream, but that the dorsal stream is unaffected. However, Blomkvist (2022) argues that this view is too simplistic to be able to account for the variety of deficits that we see in aphantasia. I would recommend either engaging more fully with this debate or cutting it, as it currently is too vague for a reader to follow.

      We have decided to leave the discussion about scene construction and its connection to complex decision making and empathy out of the current manuscript. We have included the argument of Bergmann & Ortiz-Tudela (2023) in the Introduction (Line 101):

      ‘In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Reviewer #2 (Recommendations For The Authors):

      In general, I really enjoyed reading this paper.

      Thank you very much for the positive evaluation of our manuscript as well as your comments.

      There were only a few things that I had some concerns about. For example, it was unclear to me whether the whole-brain analysis (Figures 3 and 4) was corrected for multiple comparisons or why only a small volume correction was applied for the functional connectivity analysis. If these results are borderline significant, this should be made more explicit in the manuscript. I don't think this is a major issue as the investigation of both the hippocampus and visual cortex was strongly hypothesis-driven, but it would still be good to be explicit about the strength of the findings.

      For the whole-brain analysis, we applied a threshold of p < .001, voxel cluster of 10, but no other multiple comparisons correction applied. The peak in the right hippocampus did survive the whole-brain threshold but we decided to lower this threshold just for display purposes in Figure 3, so that the readers can easily see the cluster.

      We have made the statistical thresholds more easily assessable for the reader on the following pages:

      Figure 3 (Page 27): ‘Images are thresholded at p < .001, cluster size 10, uncorrected, except (D) which is thresholded at p < .01, cluster size 10, for display purposes only (i.e., the peak voxel and adjacent 10 voxels also survived p < .001, uncorrected).’

      Figure 4 (Page 30): ‘Image is displayed at p < .05, small volume corrected, and a voxel cluster threshold of 10 adjacent voxels.’

      I was wondering whether it would be possible to use DCM to investigate the directionality of the connectivity. Given that there are only two ROIs and two alternative hypotheses (top-down versus bottom-up) this seems like an ideal DCM problem.

      We thank the reviewer for this suggestion and will consider testing the effective connectivity between both regions of interest in a future investigation. 

      Line 385: typo: 'great' should be 'greater'.

      We have altered the typo from ‘great’ to ‘greater’ in Line 397.

      Line 400: absence of evidence of an effect is not evidence of absence of an effect.

      We agree with the reviewer that this was unclear. We changed the wording in Line 412:

      ‘In addition, aphantasics and controls did not differ significantly in their time searching for a memory in AM trials, t(19) = 1.03, p = .315.’

      Typo line 623: 'overseas'.

      We have altered the mistyped word from ‘overseas’ to ‘oversees’ in Line 647.

    1. Reviewer #1 (Public Review):

      Summary:

      This is an experimentally soundly designed work and a very well-written manuscript. There is a very clear logic that drives the reader from one experiment to the next, the experimental design is clearly explained throughout and the relevance of the acquired data is well analyzed and supports the claims made by the authors. The authors made an evident effort to combine imaging, genetic, and molecular data to describe previously unknown early embryonic movement patterns and to identify regulatory mechanisms that control several aspects of it.

      Strengths:

      The authors develop a new method to analyze, quantitatively, the onset of movement during the latter embryonic stages of Drosophila development. This setup allows for a high throughput analysis of general movement dynamics based on the capture of variations of light intensity reflected by the embryo. This setup is capable of imaging several embryos simultaneously and provides a detailed measure of movement over time, which proves to be very useful for further discoveries in the manuscript. This setup already provides a thorough and quantifiable description of a process that is little known and identifies two different phases during late embryonic movements: a myogenic phase and a neurogenic phase, which they elegantly prove is dependent on neuronal activity by knocking down action potentials across the nervous system.

      However, in this system, movement is detected as a whole, and no further description of the type of movement is provided beyond frequency and amplitude; it would be interesting to know from the authors if a more precise description of the movements that take place at this stage can be achieved with this method (e.g. motion patterns across the A-P body axis).

      Importantly, this highly quantitative experimental setup is an excellent system for performing screenings of motion regulators during late embryonic development, and its use could be extended to search for different modulators of the process, beyond miRNAs (genetic mutants, drugs, etc.).

      Using their newly established motion detection pipeline, the authors identify miR-2b-1 as required for proper larval and embryonic motion, and identify an overall reduction in the quantity of both myogenic and neurogenic movements, as well as an increased frequency in neurogenic movement "pulses".

      Focusing on the neurogenic movement phenotype the authors use in situ probes and perform RT-PCR on FACS-sorted CNS cells to unambiguously detect miR-2b-1 expression in the embryonic nervous system. The neurogenic motion defects observed in miR-2b-1 mutant embryos and early larvae can be completely rescued by the expression of ectopic miR-2b-1 specifically in the nervous system, providing solid evidence of the requirement and sufficiency of miR-2b-1 expressed in the nervous system to regulate these phases of movement.

      To explore the mechanism through which miR-2b-1 impacts embryonic movement, the authors use a state-of-the-art bioinformatic approach to identify potential targets of miR-2b-1, and find that the expression levels of an uncharacterized gene, CG3638, are indeed regulated by miR-2b-1. Furthermore, they prove that by knocking down the expression of CG3638 in a miR-2b-1 mutant background, the neurogenic embryonic movement defects are rescued, pointing that the repression of CG3638 by miR-2b-1 is necessary for correct motion patterns in wild-type embryos. Therefore, this paper provides the first functional characterization of CG3638, and names this gene Motor.

      Finally, the authors aim to discriminate which elements of the embryonic motor system miR-2b-1/Motor are required. Using directed overexpression of miR-2b-1 and Motor knockdown in the motor neurons and the chordotonal (sensory) organs, they prove that the miR-2b-1/Motor regulatory axis is specifically required in the sensory organs to promote normal embryonic and larval movement.

      Weaknesses:

      The initial screening to identify miRNAs involved in motion behaviors is performed in early larval movement. The logic presented by the authors is clear - it is assumed that early larval movement cannot proceed normally in the absence of previous embryonic motion - and ultimately helped them identify a miRNA required for modulation of embryonic movement. However, it is possible that certain miRNAs play a role in the modulation of embryonic movement while being dispensable for early L1 behaviors. Such regulators might have been missed with the current screening setup.

    2. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript, "A microRNA that controls the emergence of embryonic movement" by Menzies, Chagas, and Alonso provides evidence that Drosophila miR-2b-1 is expressed in neurons and controls the expression of the predicted chloride channel CG3638, here named "Motor". Loss of the miRNA leads to movement phenotypes that can be rescued by downregulation of Motor; using specific drivers, the authors show that a larval movement phenotype (slower movement) can be rescued by knockdown of Motor in the chordotonal organs, suggesting that the increase in Motor found in the chordotonal organs is likely the root of the movement defects. Overall, I found the data presented in the manuscript of reasonable quality and are well enough supported by the presented data.

      The genetic and phenotypic analysis seems to be correct. The nicest part of the manuscript is the connection between the loss of a miRNA and finding its likely target in generating a phenotype. The authors also develop some protocols for the analysis of the movement phenotypes which may be useful for others.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the Authors:

      Reviewer 1:

      (1) Figure legends are too sparing, and often fail to describe with enough detail and accuracy the experiments presented. Especially in a work like this one, which uses plenty of different approaches and techniques and has a concise main text, description in the figure legends can really help the reader to understand the technical aspects of the experimental design. In my opinion, this will also help highlight the effort the authors put into exploring different and often new technical approaches. 

      We thank Reviewer 1 for highlighting this point and agree with them that the original figure legends lacked detailed information. In this revised version of our paper we edited all figure legends providing higher detail on experiments and information displayed (see Main text p12-16, Supplementary Information p2-5). We hope this change will improve the clarity and accuracy of the description of our experiments. 

      Reviewer 2:

      (1) Is there evidence that the early movement phenotype is actually linked to the larval movement phenotype? I noticed that the chordotonal driver experiment was only examined for larval movement. Is this driver not expressed earlier? Could the authors check the early phenotype using this driver? Are there early drivers that are expressed in chordotonal organ precursors (not panneuronal) and does the knockdown of CG3638 in these specific cells suppress the early phenotype?

      (2) More broadly, I would like to understand the function of the early embryonic movements. My concern is that they may only be a sign that the nervous system is firing up. If the rescue of the late miRNA mutant phenotype with chordotonal organ expression is only through a late change in the expression of CG3638, then the larval phenotype is probably not due to a developmental change, but a change in the immediate functioning of the neurons. Would this suggest that the early pulsing is not required for anything, at least at our level of understanding? If the driver is actually expressed early and late, then perhaps the authors could test later drivers to delimit the early and late functions of the miRNA? 

      The comments by Reviewer 2 in the points above are important and enquire about the biological role of early embryonic movements and whether these movements are linked to later larval activity or are somewhat irrelevant to the behaviour of the animal at later stages. 

      To address this important question, we conducted a new experiment in which we reduced neural activity specifically in the embryo (i.e. from 10hs AEL until the end of embryogenesis) and tested whether this treatment had any impact on larval movement. If – as put by Rev2 – the ‘early pulsing is not required for anything’ and the larval phenotype emerges from an acute change in neuronal physiology, then our experiment should show no effects at the larval stage. The results shown in Figure S4 (see Supplementary Information, p5) show that this is not the case: artificial reduction of neural activity during embryogenesis leads to a statistically significant reduction in larval speed, similar to that caused by the loss of miR-2b-1. This shows that modifications of embryonic activity impact larval movement. 

      Furthermore, earlier work on the biological role of embryonic activity identified an activity-dependent ‘critical period’ during late embryogenesis (Giachello and Baines, 2015; Ackerman et al., 2021): manipulations at or around this critical period result in both locomotor and seizure phenotypes in larvae. We cite these papers in the main text (p7).

      In addition, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity specifically during the embryonic period prevents the generation of normal neural activity patterns in both, embryo and larva. Similar results are observed when proprioceptive sensory inputs to the central nervous system are blocked, with larval locomotion also disrupted. 

      Altogether, the data already in the literature plus our new addition to the paper, show that early embryonic movements play a key role in the development of the nervous system and larval locomotion.

      (3) Given the role in the larval chordotonal organs, have the authors also checked the adult movements? 

      The question of whether miR-2b-1 action in chordotonal organs affects behaviour at later stages of the Drosophila life cycle is interesting and was the reason why we assessed different genetic manipulations at the larval stage. However, we believe that assessing adult locomotor phenotypes is beyond the scope of this paper. 

      (4) The authors state that mir-2b-1 is a mirtron. I do not believe this is correct. It is not present in an intron in Btk from what I can see. Also, in the reference that the authors use when stating that mir-2b is a mirtron, I believe mir-2b-1 is actually used as a non-mirtron control miRNA. As mirtrons are processed slightly differently from regular hairpins and often use only the 3' end of the hairpin for miRNA creation, this may not be a trivial distinction. 

      We are grateful to Rev2 for highlighting this point: indeed, as they say, miR-2b-1 is located in the 3’UTR of host gene Btk, rather than in an intron. Accordingly, in this revision we remove the comment on miR-2b-1 being a mirtron (p6) and deleted the citation accordingly. 

      (5) For miRNA detection, the authors use in situ hybridization and QPCR. Both methods show that the gene is expressed but not that the mature miRNA is made. If the authors wanted a truly independent test for the presence of the miRNA, a miRNA sensor might be a better choice and it would hint at which part of the hairpin makes the functional miRNA. This is probably not necessary but could be a nice addition. 

      We thank Rev2 for drawing attention to this point and allowing this clarification. The qPCR protocol we used is based on the method developed by Balcells et al., 2011 (w/303 citations) (see Materials and Methods section in Supplementary Information, p14) which allows the specific amplification of mature miRNA transcripts, and not their precursors. This method for mature miRNA PCR is so robust that it has even been patented (WO2010085966A2). To ensure that the reader is clear about our methods, we state in the main text (p6) that we perform "RT-PCR for the mature miRNA transcript".  [NB: miRNA sensors provide a useful method to assess miRNA expression but can also act as competitive inhibitors of physiological miRNA functions, titrating away miRNA molecules from their real targets in tissue; therefore, results using this method are often difficult to interpret.]

      (6) Curious about mir-2b-1 and any overlap with the related mir2b-2 and the mir2a genes. I am just wondering about the similarity in their sequences/targets and if they might have similar phenotypes or enhance the phenotypes being scored by the authors. 

      This is an interesting point raised by REV2 and indeed miR-2b-1 does belong to the largest family of microRNAs in Drosophila, the miR-2 family, discussed in detail by Marco et al., 2012. However, we consider that performing tests of additional miRNA mutations, both individually and in combination with miR-2b-1, is beyond the scope of this paper.

      (7) Related to this, the authors show that the reduction of a single miRNA target suppresses the miRNA loss of function phenotype. This indicates that this target is quite important for this miRNA. I wonder if the target site is conserved in the human gene that the authors highlight.

      This is another interesting comment by Rev2. To pursue their idea, we have performed a blast for the miR-2b-1 target site in the human orthologs of CG3638 and did not find a match suggesting that the relationship between miR-2b-1 and CG3638 is not evolutionarily preserved between insects and mammals. 

      Public Reviews:

      Reviewer #1:

      Weaknesses: 

      The authors do not describe properly how the miRNA screening was performed and just claim that only miR-2b-1 mutants presented a defective motion phenotype in early L1. How many miRNAs were tested, and how candidates were selected is never explicitly mentioned in the text or the Methods section.

      We identified miR-2b-1 as part of a genetic screen aimed at detecting miRNAs with impact on embryonic movement, but this full screen is not yet complete. Seeing the clear phenotype of miR2b-1 in the embryo prompted us to study this miRNA in detail, which is what we report in this paper. 

      The initial screening to identify miRNAs involved in motion behaviors is performed in early larval movement. The logic presented by the authors is clear - it is assumed that early larval movement cannot proceed normally in the absence of previous embryonic motion - and ultimately helped them identify a miRNA required for modulation of embryonic movement. However, it is possible that certain miRNAs play a role in the modulation of embryonic movement while being dispensable for early L1 behaviors. Such regulators might have been missed with the current screening setup. Although similar changes to those described for the neurogenic phase of embryonic movement are described for the myogenic phase in miR-2b-1 mutants (reduction in motion amplitude), this phenotype goes unexplored. This is not a big issue, as the authors convincingly demonstrate later that miR-2b-1 is specifically required in the nervous system for proper embryonic and larval movement, and the effects of miR-2b-1 on myogenic movement might as well be the focus of future work. However, it will be interesting to discuss here the implications of a reduced myogenic movement phase, especially as miR-2b-1 is specifically involved in regulating the activity of the chordotonal system - which precisely detects early myogenic movements. 

      We thank Rev1 for their interest in that loss of miR-2b-1 results in a decrease in movement during the myogenic phase, in addition to the neurogenic phase. Indeed, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity during a period that overlaps with the myogenic phase prevents the formation of normal neural activity patterns and larval locomotion. They also observe the same when inhibiting proprioceptive sensory inputs to the central nervous system. This could suggest that the effects of miR-2b-1 on the myogenic phase might have ‘knock-on’ effects upon the later neurogenic phase and larval movement. However, we note that genetic restoration of miR-2b-1 expression specifically to neurons completely rescues the larval speed phenotype (Fig. 3G), suggesting that the dominant effect of miR-2b-1 upon movements is through its action within neurons. To recognise Rev1’s comment we have added a short sentence to the text (p7) suggesting that ‘the effects of miR-2b-1 observed at earlier stages (myogenic phase) are possibly offset by normal neural expression of miR-2b-1’.  

      FACS-sorting of neuronal cells followed by RT-PCR convincingly detects the presence of miR-2b-1 in the embryonic CNS. However, control of non-neuronal cells would be required to explore whether miR-2b-1 is not only present but enriched in the nervous system compared to other tissues. This is also the case in the miR-2b-1 and Janus expression analysis in the chordotonal organs: a control sample from the motor neurons would help discriminate whether miR-2b-1/Janus regulatory axis is specifically enriched in chordotonal organs or whether both genes are expressed throughout the CNS but operate under a different regulation or requirements for the movement phenotypes.

      The RNA in situ hybridisation data included in the paper (Fig. 3B) show that RNA probes for miR2b-1 precursors reveal very strong signal in neural tissue – with very low signal detected in other tissues – strongly indicating that expression of miR-2b-1 is highly enriched in the nervous system.

      Reviewer #2:

      Weaknesses: 

      As I mentioned above, I felt the presentation was a bit overstated. The authors present their data in a way that focuses on movement, the emergence of movement, and how their miRNA of interest is at the center of this topic. I only point to the title and name that they wish to give the target of their miRNA to emphasize this point. "Janus" the GOD of movement and change. The results and discussion section starts with a paragraph saying, "Movement is the main output of the nervous system... how developing embryos manage to organise the necessary molecular, cellular, and physiological processes to initiate patterned movement is still unknown. Although it is clear that the genetic system plays a role, how genes control the formation, maturation and function of the cellular networks underlying the emergence of motor control remains poorly understood." While there is nothing inherently untrue about these statements, it is a question of levels of understanding. One can always argue that something in biology is still unknown at a certain level. However, one could also argue that much is known about the molecular nature of movement. Next, I am not sure how much this work impacts the area of study regarding the emergence of movement. The authors show that a reduction of a miRNA can affect something about certain neurons, that affects movement. The early movements, although slightly diminished, still emerge. Thus, their work only suggests that the function of some neurons, or perhaps the development of these neurons may impact the early movements. This is not new as it was known already from early work from the Bate lab.  Later larval movements were also shown to be modified in the miRNA mutants and were traced to "janus" overexpression in the chordotonal organs. As neurons are quite sensitive to the levels of Cl- and Janus is thought to be a Cl- channel, this could lead to a slight dysfunction of the chordotonal neurons. So, based on this, the work suggests that dysfunction of the chordotonal organs could impact larval movement. This was, of course, already known. The novelty of this work is in the genes being studied (important or not). We now know that miR 2b-1 and Janus are expressed in the early neurons and larval chordotonal neurons and their removal is consistent with a role for these genes in the functioning of these neurons. This is not to trivialize these findings, simply to state that these results are not significantly changing our overall understanding of movement and the emergence of movement. I would call it a stretch to say that this miRNA CONTROLS the emergence of movement, as in the title. 

      As already mentioned in our provisional response, on this point we politely – but strongly – disagree with Rev2’s suggestion that the findings are inflated by our language. We also note that they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. 

      Finally, the name Janus should be changed as it is already being used. A quick scan of flybase shows that there is a Janus A and B in flies (phosphatases) and I am surprised the authors did not check this. I was initially worried about the Janus kinase (JAK) when I performed the search. While I understand that none are only called Janus, studies of the jan A and B genes refer to the locus as the janus region, which could lead to confusion. The completely different molecular functions of the genes relative to CG3638 add to the confusion. Thus, I ask that the authors change the name of CG3638 to something else.

      Thank you for spotting this omission. In the revised MS we propose a new name – Movement Modulator (Motor) – for the gene previously described as Janus (CG3638) to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names. All instances where Janus was used are now replaced by Motor (abstract; main text pages 9-10; Figure 4).

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

    1. eLife assessment

      This manuscript is useful to researchers with an interest in cervical cancers because it provides scRNA-seq data from a diverse cohort of 15 early-stage cervical cancer patients. While the dataset could be of use to the research community, the key claims of the paper around the immunosuppressive microenvironment associated with specific tumour cell clusters (and the properties/importance of those clusters) are incomplete. Additional experiments will be required to substantiate these claims.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors in this manuscript performed scRNA-seq on a cohort of 15 early-stage cervical cancer patients with a mixture of adeno- and squamous cell carcinoma, HPV status, and several samples that were upstaged at the time of surgery. From their analyses they identified differential cell populations in both immune and tumour subsets related to stage, HPV status, and whether a sample was adenocarcinoma or squamous cell. Putative microenvironmental signaling was explored as a potential explanation for their differential cell populations. Through these analyses the authors also identified SLC26A3 as a potential biomarker for later stage/lymph node metastasis which was verified by IHC and IF. The dataset is likely useful for the community, however, the strong claims made are not adequately supported by the data and would require additional functional validation.

      Strengths:

      The dataset could be useful for the community.<br /> SLC26A3 could potentially be a useful marker to predict lymph node metastasis with further study.

      Weaknesses:

      The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      For the sequencing, which kit was used on the Novaseq6000?

      Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

    3. Reviewer #2 (Public Review):

      Summary:

      Peng et al. present a study using scRNA-seq to examine phenotypic properties of cervical cancer, contrasting features of both adenocarcinomas (ADC) and squamous cell carcinoma (SCC), and HPV-positive and negative tumours. They propose several key findings: unique malignant phenotypes in ADC with elevated stemness and aggressive features, interactions of these populations with immune cells to promote an immunosuppressive TME, and SLC26A3 as a biomarker for metastatic (>=Stage III ) tumours.

      Strengths:

      This study provides a valuable resource of scRNA-seq data from a well-curated collection of patient samples. The analysis provides a high-level view of the cellular composition of cervical cancers. The authors introduce some mechanistic explanations of immunosuppression and the involvement of regulatory T cells that are intriguing.

      Weaknesses:

      I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

    4. Author response:

      Reviewer #1 (Public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      Response: upon revision, we plan to rewrite the introduction of the manuscript.

      (2) For the sequencing, which kit was used on the Novaseq6000?

      Response: for sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and will add the information in Methods.

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      Response: we apologize for the inadequacy of descriptions of data analysis process due to word count limit. We plan to provide more information, and if possible we also would like to provide scripts as supplementary data in the revised manuscript.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      Response: we will add the list of marker genes for cell type annotation in the revised manuscript.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      Response: considering this inadequacy, we plan to use statistic approaches for further analyses to compare the differences between each set of groups up revision.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Response: we feel sorry for impreciseness when presenting histograms such as Fig 2D and we will add labels in Y-axis. As for the width of bars, we just used the histograms generated originally from the data package. However, we did not intend to double the width on purpose to strengthen the visual importance. We sincerely feel sorry for this and will correct the similar mistakes alongside the whole manuscript.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Response: we agree that many conclusions, which were based on bio-informatic predictions, are written in an over-affirmative way. Upon revision, we will rewrite these conclusions more precisely.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Response: we are thankful for this suggestion. We think that each cluster of epithelial cells is specified from other clusters and identified by DEGs, but they are not heavily unconnected from others. Upon revision, we plan to add further validation for the existence of Epi_10_CYSTM1.

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Response: from the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. We plan to rewrite the conclusion more precisely or delete this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      Response: we feel thankful for this question. The conclusion “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We will correct the description in the up-coming revised manuscript. As for SLC26A3, we also do not think it is “broadly” expressed, but it is specified in later tumors. When we presented the data of IHC, we only showed the strongly-positive area of each slide in order to emphasize the differences, however, this has caused misunderstandings. Thus, upon revision, we would like to show the other areas of one case or even the scan of one whole slide as supplementary data.

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Response: we apologize for the ignorance of further validation of cytotoxic T cells. From fig. 4B and 4C, the four different clusters of T cells were basically identified based on canonical T cell markers. And then we focused mainly on the validation and further analysis of Tregs, neglecting the other clusters. In fig. 4D we intended to only show the top DEGs in each T cell cluster and hoped to find some potential marker genes for next-step analysis. However, we did not notice that there might be contamination of epithelial cells within cytotoxic T cells when clustering. We will optimize the analysis of this part in our revision.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Response: our initial purpose was to use GO analysis as supports for our conclusions. However we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we plan to rewrite the conclusion from the GO analysis in a more scientific way or delete these data.

      Reviewer #2 (Public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      Response: we understand that many of the conclusions are too sure but lack profound supporting evidence, thus we will optimize the writing in the revised manuscript. More importantly, to strengthen the validity of our data, we will try to use statistical approaches for further analysis.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      Response: we sincerely feel grateful for being questioned on the validity, appropriateness and the real potential of SLC26A3. We plan to add more explanation of the importance of SLC26A3 in the discussion part. We are also sorry for some over-sure conclusions about ADC-specific cell clusters, as well as the marker gene SLC26A3. However, we do not think these conclusions are problematic. In fact, due to the heterogeneity among different individuals, as well as even different sites within one individual when sampling, we think a “small faction” does not means it will not make sense. Also, these ADC-specific clusters (including Epi_10_CYSTM1) do have certain proportions when comparing with those “big fraction” groups (Fig. 2D). Furthermore, when considering the specificity of DEGs to ADC only, but not to SCC, we think it might be these ADC-specific cluster genes to have the central function to make a difference between ADC and SCC. And we further used validation experiment to support our hypothesis. Lastly and most importantly, SLC26A3 was coming from sample 7 whose clinical stage is FIGO IIIC (late stage) and pathological type is ADC. Among the 15 cases, there are only 4 cases whose clinical stages are late (within which 3 are ADC). At this point of view, we think 1 in 3 (33%) having expression of SLC26A3 (or existence of cluster Epi_10_CYSTM1) should be considered as a potential choice. Samples coming from early-staged and SCC patients do not have fractions of Epi_10_CYSTM1. This likewise indicates the specificity of this cell cluster to ADC. Therefore, in our revised manuscript, we plan to add more in-depth discussion about this question.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Response: do you mean Figure 1B and D? In the revised manuscript, we will list the canonical marker genes to cluster different types of cells to at least support that the clustering of cell types match most of the present published references. To further avoid the contamination of cells in each cluster, we will use quality controls and re-analyze these data upon revision.

    1. eLife assessment

      This important work illuminates the dynamics of BRAF in both its monomeric and dimeric forms, with or without inhibitors, combining traditional techniques and sophisticated computational analyses. The evidence presented is convincing and suggests a potential allosteric effect, though substantiating the exact mechanism will require further studies. The work has implications for understanding kinase signaling and the development of potential drug candidates. This study will be of interest to structural biologists, medicinal chemists, and pharmacologists.

    2. Reviewer #1 (Public Review):

      Summary:

      This manuscript from Clayton and co-authors, entitled "Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors", aims at clarifying the molecular mechanism of BRAF dimer selectivity. Indeed, first generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microseconds MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped identify a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      Strengths:

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidines protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors are able to stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      Weaknesses:

      Regarding the analyses of the mixed state simulations, the DFG dihedral probability densities for the apo protomer (Fig. 5a right) are highly overlapping. It is not convincing that a slight shift can support the conclusion that the binding in one protomer is enough to shift the DFG motif outward allosterically. Moreover, the DFG dihedral time-series for the apo protomer (Supplementary Figure 9) clearly shows that the measured quantities are affected by significant fluctuations and poor consistency between the three replicates. The apo protomer of the mixed state simulations could be affected by the same problem that the authors pointed out in the case of the apo dimer simulations, where the amount of sampling is insufficient to model the DFG-out/-in transition properly. There is similar concern with the Lys483-Glu501 salt bridge measured for the apo protomers of the mixed simulations. As it can be observed from the probabilities bar plot (Fig. 5a middle), the standard deviation is too high to support a significant role for this interaction in the allosteric modulation of the apo protomer.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors employ molecular dynamics simulations to understand the selectivity of FDA approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signalling and the development of future BRAF inhibitor drugs.

      Strengths:

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (All-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      Weaknesses:

      Despite the use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems, the authors could consider adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comment 1: This manuscript from Clayton and co-authors, entitled ”Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors”, aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first-generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microsecond MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped in identifying a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidine protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors can stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: One potential weakness in the manuscript is the lack of reported uncertainties related to the analyzed quantities. Providing this information would significantly enhance the clarity regarding the reliability of the analyses and the confidence in the claims presented.

      Response and revision: We agree with the reviewer that reporting uncertainties will clarify and strengthen our arguments. Following this suggestion, we have added error bars to Figures 3 and 5 representing the standard deviation of the K-E salt bridge probability. This shows that the deviation across replicas of how often the salt bridge is present. Thus, it better supports our claim that this salt bridge is promoted by the presence of PHI1, as the deviation of the salt bridge is minimal for protomers containing PHI1. In addition to these error bars, we have also included a table to the Supplementary Information (Supplementary Table 2) containing the mean and standard deviation of the αC position, K-E distance, and DFG pseudo dihedral for each protomer in our dimer simulations.

      Reviewer #2 (Public review):

      Comment 1: The authors employ molecular dynamics simulations to understand the selectivity of FDA-approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signaling and the development of future BRAF inhibitor drugs.

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (all-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: The use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems. However, the authors could consider the adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

      Response: The current free energy methods are capable of giving accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet. Thus, we decided not to pursue the calculations.

      Reviewer #1 (Suggestions to author)

      Comment 1: The general recommendation is to give more details about the procedure for the analyses performed and, when possible, show the uncertainties relative to the analyzed quantities. This would clearly indicate the reliability of the analyses and the confidence of the claims. Moreover, it is not always clear how the analyses were performed.

      Response and revision: As previously mentioned, we have added uncertainties to our bar graphs in Figures 3 and 5 as well as Supplemental Table 2. In regards to the clarity of our analysis, we added more detail on how the probability distributions were created, which we will discuss in our response to Comment 3.

      Comment 2: It is not clear why the authors decided to titrate only the histidines without considering the other charged residues. In particular, the authors show in Supplementary Figure 2 a network of which Asp595 (protomer A) is a part and that, given the direct interaction, could affect the protonation state of His477 (protomer B).

      Response: The reviewer is correct in that Asp595 directly interacts with His477 on the opposite protomer. This is exactly the reason why we did not consider titrating Asp595 – the interaction with His477 should further stabilize the charged state of Asp595 and downshift its pKa from the solution value of about 3.8. Thus, Asp595 will be charged at physiological pH and does not need to be titrated in the CpHMD simulations.

      Comment 3: Regarding the probability density plots (Figures 3 and 5), clarify if you used all the data from all the replicas and all the protomers. If possible, show a comparison between each replica in the Supplementary Figures. A Supplementary Table with the probability values for the measured K-E salt bridge could be helpful since the bar plots are hard to compare. Also in this case please report the uncertainty or a comparison between the replicas.

      Response and revision: To clarify how we created the probability density plots, the following line was added to the Methods section:

      On page 15, third paragraph: All probability distributions were created by combining the last three µs of each replica for each system, with each distribution consisting of 50 bins. Unless specified, distributions contain quantities from both protomers in dimeric simulations.

      As previously mentioned, we have included Supplemental Table 2 which contains the mean and standard deviation of the K-E distance across systems. For comparison between replicas, we found the time series of the K-E distance in the inhibitor-bound monomer and dimer systems in Supplemental Figure 7 to be sufficient.

      Comment 4: It would be better to define the claim: ”it is clear that the timescale of the DFG-out to DFG-in transition is longer than our simulation timeframe of a few microseconds” (lines 208-209). To me it is not obvious why this should be ”clear”.

      Response and revision: Our original statement was to convey that, as DFG-in is sampled very rarely, our simulations cannot accurately represent DFG transitions. We have revised the manuscript to the following:

      On page 6, fourth paragraph: While this does suggest dimerization loosens the DFG motif, our simulations do not appropriately model the DFG-out/-in transition as the DFG-in state is only occasionally sampled.

      Comment 5: In the case of the inhibited monomer simulations, the authors state: ”the PHI1Glu501 interaction can become completely disrupted, with the distance moving beyond 6 A to˚ as high as 12 A; correlated with the disruption of the PHI1-Glu501 interaction, the˚     αC position is shifted out to the range of 21 A-24˚ A” (lines 241-244). However, the plot of the PHI1-Glu501˚ interaction time-series (Supplementary Figure 7) shows that just in one replica of one protomer (Protomer A), the interaction is disrupted, and the αC position never exceeds 21 A (time-series˚ reported in Supplementary Figure 6). None of the fluctuations of the αC position appear to be correlated with the disruption of the ligand-Glu501 interaction. The time-series reported in Supplementary Figures 6 and 7 suggest that the two events are uncorrelated. Please explain this aspect or quantify the correlation to support your claim.

      Response: We believe the source of this confusion is because we did not include a time series of αC for inhibited monomer simulations–Supplementary Figure 6 mentioned in the comment is of dimeric BRAF. Thus, We have added Supplementary Figure 8, a timeseries plot of the αC position for inhibited monomer and dimer protomers.

      Comment 6: Regarding the analyses of the positive cooperativity, the DFG dihedral probability densities for the apo protomer (Figure 5a) are highly overlapping. Thus, it is hard to believe that these small differences support the claim that ”PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor” (lines 300-302). The authors should show that the differences in the DFG distributions (in particular, apo dimer vs PHI1 mixed) are statistically significant. Only in this case, the data could support the claim that PHI1 bound to one protomer modulates the DFG conformation in the second one. In my opinion, the overlap between the DFG dihedral probability (Figure 5a) is too high to support the claim that PHI1 is able to allosterically modulate this region in the second apo protomer. Please provide an appropriate statistical test that demonstrates that those distributions are significantly different.

      Response and revision: We have adjusted this statement based on the new Supplementary Table 2 to read as the following:

      On page 9, third paragraph: Although the shift is small (the differences between means is approximately one standard deviation, see Supplementary Table 2), it suggests that PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor. In contrast, the DFG dihedral of the apo protomer in the LY-bound mixed dimer appears to be slightly smaller than the apo dimer with difference between means of approximately one standard deviation (Supplementary Table 2), which is unfavorable for binding the second inhibitor (orange and grey, Figure 5a right).

      Comment 7: Regarding the dimer holo simulations, I agree that in the LY-bound dimer simulations, the hydrogen bond between the ligand and the E501 is weaker, but I do not understand the sentence ”as seen from the local density maximum centered at∼3.4 A” at line 233, since the 2D˚ density plot (Figure 3h) shows that the highest peak is close to 5 A. Also, it would be useful to˚ clarify how these 2D density plots reported in Figure 3 were obtained.

      Response and revision: While the highest peak in Figure 3h is close to 5 A, we were more˚ interested in the local peak close to 3.4 A. To avoid confusion we have modified the line to separate˚ both peaks:

      On page 7, second paragraph: In the LY-bound dimer simulations, however, the LY–Glu501 h-bond is weaker and less stable than the counterpart of the PHI1-bound dimer, as seen from the local density maximum centered at ∼3.4 and the global maximum near ∼4.5 A (Figure 3g,h).˚

      Comment 8: I have a comment on the strategy suggested to empirically classify the inhibitors by comparing the Glu501-Lys483 distance and the αC position in the two protomers of the crystal structures (in the Concluding Discussion section). The authors suggest that differences below 1 A could determine whether the flexibility of these regions is restricted or not (and whether the˚ inhibitor is equipotent or dimer-selective). However, differences below 1 A, in structures where˚ the average resolution is 2.5 A, might be highly unreliable. In fact, as the authors pointed out, LY˚ and Ponatinib would be classified (erroneously) as dimer-selective inhibitors according to these criteria.

      Response and revision: We agree that this proposed method could be unreliable; we intend this strategy to be used as a “quick and dirty” method for analyzing future structures in order to assess selectivity for dimeric BRAF. To convey this, we added the following sentence:

      On page 12, second paragraph: Given that the resolution of a resolved structure is often ∼23 A, this proposed assessment is not intended to replace more rigorous tests, i.e. utilizing MD˚ simulations.

      Comment 9: A suggestion is to include representative snapshots of the MD simulation in the GitHub repository could allow the reader to better appreciate the results described in the present study.

      Response and revision: In order to convey the difference between induced effects of PHI1 and LY, we have added a new folder named snapshots to the GitHub repository which contains the snapshots from the simulations of one LY or one PHI1 bound BRAF (visualized in Figure 5c) in the form of PDB files.

    1. Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Tie et.al., the authors couple the methodology which they have developed to measure LQ (localization quotient) of proteins within the Golgi apparatus along with RUSH based cargo release to quantify the speed of different cargos traveling through Golgi stacks in nocodazole induced Golgi ministacks to differentiate between cisternal progression vs stable compartment model of the Golgi apparatus. The debate between cisternal progression model and stable compartment model has been intense and going on for decades and important to understand the basic way of function/organization of the Golgi apparatus. As per the stable compartment model, cisterna are stable structures and cargo moves along the Golgi apparatus in vesicular carriers. While as per cisternal progression model, Golgi cisterna themselves mature acquiring new identity from the cis face to the trans face and act as transport carriers themselves. In this work, authors provide a missing part regarding intra-Golgi speed for transport of different cargoes as well as the speed of TGN exit and based on the differences in the transport velocities for different cargoes tested favor a stable compartment model. The argument which authors make is that if there is cisternal progression, all the cargoes should have a similar intra-Golgi transport speed which is essentially the rate at which the Golgi cisterna mature. Furthermore, using a combination of BFA and Nocodazole treatments authors show that the compartments remain stable in cells for at least 30-60 minutes after BFA treatment.

      Strengths:

      The method to accurately measure localization of a protein within the Golgi stack is rigorously tested in the previous publications from the same authors and in combination with pulse chase approaches has been used to quantify transport velocities of cargoes through the Golgi. This is a novel aspect in this paper and differences in intra-Golgi velocities for different cargoes tested makes a case for a stable compartment model.

      Weaknesses:

      Experiments are only tested in one cell line (HeLa cells) and predominantly derived from experimental paradigm using RUSH assays where a secretory cargo is released in a wave (not the most physiological condition) and therefore additional approaches would make a more compelling case for the model.

    2. eLife assessment

      This important study sheds new light on cargo movement within the Golgi apparatus, challenging the cisternal progression model by providing convincing evidence for a velocity decrease from cis to trans Golgi and variable speeds within cisternae, suggesting a more stable compartmental nature. While these findings propose refinements to the classic model, they prompt further exploration of recent models like rapid partitioning and rim progression, necessitating additional experimental approaches to account for cargo expression variations and HeLa cell-specific effects.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the use of quantitative imaging approaches, which have been a key element of the labs work over the past years, to address one of the major unresolved discussions in trafficking: intra-Golgi transport. The approach used has been clearly described in the labs previous papers, and is thus clearly described. The authors clearly address the weaknesses in this manuscript and do not overstate the conclusions drawn from the data. The only weakness not addressed is the concept of blocking COPI transport with BFA, which is a strong inhibitor and causes general disruption of the system. This is an interesting element of the paper, which I think could be improved upon by using more specific COPI inhibitors instead, although I understand that this is not necessarily straightforward.

      I commend the authors on their clear and precise presentation of this body of work, incorporating mathematical modelling with a fundamental question in cell biology. In all, I think that this is a very robust body of work, that provides a sound conclusion in support of the stable compartment model for the Golgi.

      General points:

      The manuscript contains a lot of background in its results sections, and the authors may wish to consider rebalancing the text: The section beginning at Line 175 is about 90% background and 10% data. Could some data currently in supplementary be included here to redress this balance, or this part combined with another?

    4. Reviewer #3 (Public Review):

      The manuscript by Tie et al. provides a quantitative assessment of intra-Golgi transport of diverse cargos. Quantitative approaches using fluorescence microscopy of RUSH synchronized cargos, namely GLIM and measurement of Golgi residence time, previously developed by the author's team (publications from 20216 to 2022), are being used here.

      Most of the results have been already published by the same team in 2016, 2017, 2020 and 2021. In this manuscript, very few new data have been added. The authors have put together measurements of intra-Golgi transport kinetics and Golgi residence time of many cargos. The quantitative results are supported by a large number of Golgi mini-stacks/cells analyzed. They are discussed with regard to the intra-Golgi transport models being debated in the field, namely the cisternal maturation/progression model and the stable compartments model. However, over the past decades, the cisternal progression model has been mostly accepted thanks to many experimental data.

      The authors show that different cargos have distinct intra-Golgi transport kinetics and that the Golgi residence time of glycosyltransferases is high. From this and the experiment using brefeldinA, the authors suggest that the rim progression model, adapted from the stable compartments model, fits with their experimental data.

      Strengths:

      The major strength of this manuscript is to put together many quantitative results that the authors previously obtained and to discuss them to give food for thought about the intra-Golgi transport mechanism.<br /> The analysis by fluorescence microscopy of intra-Golgi transport is tough and is a tour de force of the authors even if their approach show limitations, which are clearly stated. Their work is remarkable in regards to the numbers of Golgi markers and secretory cargos which have been analyzed.

      Weaknesses:

      As previously mentioned, most of the data provided here were already published and thus accessible for the community. Is there is a need to publish them again?<br /> The authors' discussion about the intra-Golgi transport model is rather simplistic. In the introduction, there is no mention of the most recent models, namely the rapid partitioning and the rim progression models. To my opinion, the tubular connections between cisternae and the diffusion/biochemical properties of cargos are not enough taken into account to interpret the results. Indeed, tubular connections and biochemical properties of the cargos may affect their transit through the Golgi and the kinetics with which they reach the TGN for Golgi exit.<br /> Nocodazole is being used to form Golgi mini-stacks, which are necessary to allow intra-Golgi measurement. The use of nocodazole might affect cellular homeostasis but this is clearly stated by the authors and is acceptable as we need to perturb the system to conduct this analysis. However, the manual selection of the Golgi mini-stack being analyzed raises a major concern. As far as I understood, the authors select the mini-stacks where the cargo and the Golgi reference markers are clearly detectable and separated, which might introduce a bias in the analysis.<br /> The terms 'Golgi residence time ' is being used but it corresponds to the residence time in the trans-cisterna only as the cargo has been accumulated in the trans-Golgi thanks to a 20{degree sign}C block. The kinetics of disappearance of the protein of interest is then monitored after 20{degree sign}C to 37{degree sign}C switch.<br /> Another concern also lies in the differences that would be introduced by different expression levels of the cargo on the kinetics of their intra-Golgi transport and of their packaging into post-Golgi carriers.

    1. eLife assessment

      This study presents valuable findings on the ligand- and ion-dependent structural dynamics of a transcriptional riboswitch. The single-molecule data presented are solid and prompts intriguing hypotheses and models, which will undoubtedly stimulate future structural analyses. These findings are of considerable interest to biochemists and biophysicists engaged in the study of RNA structure and riboswitch mechanisms.

    2. Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support the author's proposed model of how the riboswitch dynamics contribute to function.<br /> (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.<br /> (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.<br /> (4) In the revised version, the authors utilized multiple destabilizing and compensatory mutations to strengthen their structural interpretation of the KL structure and dynamics and cementing their conclusions.

    3. Reviewer #2 (Public Review):

      Summary:

      Gao et al., used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging, but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very highly quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into co-transcriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

    4. Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. Results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation on the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligand-binding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light into the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool to the understand of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when comparing to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only on complexity, but also in transcriptional speed, that can direct interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with the dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single-molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support for the author's proposed model of how the riboswitch dynamics contribute to function.

      (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.

      (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after the extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.

      Weaknesses:

      (1) The authors use only one mutant to confirm that their FRET signal indicates the formation of the KL. Importantly, this mutation does not involve the nucleotides that are part of the KL interaction. It would be more convincing if the authors used mutations in both strands of the KL and performed compensatory mutations that restore base pairing. Experiments like this would solidify the structural interpretation of the work, particularly in the context of the full-length riboG RNA or in the cotranscriptional mimic experiments, which appear to have more conformational heterogeneity.

      We thank the reviewer for describing our work “in-depth characterization” of riboG. We agree with the reviewer and we have added two more mutants, G71C and U72C with the mutations located at the KL (Figure 2– figure supplement 8A, 8B, 9A, 9B, Figure 3– figure supplement 6A, 6B, 7A, 7B, and Figure 4– figure supplement 6A, 6B, 7A, 7B). Furthermore, we have performed compensatory mutations, C30G-G71C and A29G-U72C that restore base pairing in the KL (Figure 2– figure supplement 8C, 8D, 9C, 9D, Figure 3– figure supplement 6C, 6D, 7C, 7D, and Figure 4– figure supplement 6C, 6D, 7C, 7D). We added the experimental results in the revised manuscript accordingly as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).

      (2) The existence of the pre-folded state (intermediate FRET ~0.5) is not well supported in their data and could be explained by an acquisition artifact. The dwell times are very short often only a single frame indicating that there could be a very fast transition (< 0.1s) from low to high FRET that averages to a FRET efficiency of 0.5. To firmly demonstrate that this intermediate FRET state is metastable and not an artifact, the authors need to perform measurements with a faster frame rate and demonstrate that the state is still present.

      We thank the reviewer for the great comment. We added smFRET experiments at higher time resolution, 20 ms, as well as lower time resolution (Figure 2– figure supplement 3).  Based on our experimental results, the intermediate state (EFRET ~0.5) exists at the smFRET collected at 20 ms, 100 ms and 200 ms. 

      (3) The PLOR method employs a non-biologically relevant polymerase (T7 RNAP) to mimic transcription elongation and folding near the elongation complex. T7 RNAP has a shorter exit channel than bacterial RNAPs and therefore, folding in the exit channel may be different between different RNAPs. Additionally, the nascent RNA may interact with bacterial RNAP differently. For these reasons, it is not clear how well the dynamics observed in the T7 ECs recapitulate riboswitch folding dynamics in bacterial ECs where they would occur in nature. 

      We thank the reviewer for the comment. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 13–14).

      Reviewer #2 (Public Review):

      Summary:

      Gao et al. used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very high quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into cotranscriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near-cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

      We thank the reviewer for describing our work “The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14).

      Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. The results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation of the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligandbinding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light on the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool for the understanding of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full-length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when compared to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only in complexity, but also in transcriptional speed, which can directly interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware of is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

      We thank the reviewer for describing our work as “The investigations were very thorough, providing data that supports the conclusions”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the cotranscriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14). And we also agree with the reviewer that the lower NTP may affect the transcriptional speed. Regarding the fluorophores, we purposely placed them away from the KL to avoid their influence on the formation of the KL.

      Reviewer #1 (Recommendations For The Authors):

      Related to weakness 1

      - The authors cite a paper that investigated mutations in the KL duplex but do not include these mutations in their analysis. It is unclear why the authors chose the G77C mutation and not the other mutants previously tested. Can the authors explain their choice of mutation in detail in the text? I also did not see the proposed secondary structure for the G77C mutant shown in Figure 2 -supp 3A in the cited paper, is this a predicted structure? Please explain how this structure was determined. 

      We thank the reviewer for the comment. The reason we chosen the G77C mutation is based on previous report that G77C can disturb the formation of the KL, as we stated in the manuscript as “Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)” ( page 7). And the secondary structure for the G77C mutant was predicted by Mfold, which as cited in the manuscript and added in the reference list as “Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13), 3406-3415”. 

      - It is not clear to me that the structural interpretation of their FRET states is correct and that the FRET signal reports on the base pairing of the KL in only the high FRET state. The authors should perform experiments with additional mutations in the KL duplex to confirm that their construct reports on KL duplex formation alone and not other structural dynamics. 

      We thank the reviewer for the comment. We have included additional mutations to establish a connection between the high-FRET state to the formation of the KL. The results have been added to the manuscript as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).  

      - For the full-length riboG-136 (Cy3Cy5 riboG in Figure 4), the authors have clearly defined peaks at 0.6 and 0.4. However, the authors do not explain their structural interpretation of these states. Do the authors believe that the KL is forming in these states? It would be helpful to have data on mutations in the KL in the context of the full-length riboG to better understand the structural transitions of these intermediate states. 

      Based on our mutation studies, we proposed that the peak with EFRET ~0.8 corresponds to the conformation with the KL, while the states with EFRET ~0.4 and 0.6 are the states without a stable KL. 

      Related to weakness 2:

      - For the riboG-apt and riboG-term RNAs, the proposed intermediate FRET state (EFRET = 0.5) is poorly fit by a Gaussian and the dwell times in the state are almost entirely single-frame dwells. It is likely that this state is the result of a camera blurring artifact, in which RNAs undergo a FRET transition between two frames giving an apparent FRET efficiency which is between that of the two transitioning states. This artifact arises when the average dwell times of the true states (Elow and Ehigh) are comparable to the frame duration (within a factor of ~5-10; see https://doi.org/10.1021/acs.jpcb.1c01036). To confirm the presence of the intermediate state, the authors should perform at least a few experiments with higher time resolution to support the existence of the 0.5 state with a lifetime of 0.1 s. Alternatively, the data should be refit to a two-state HMM and the authors could explain in the text that the density in the FRET histogram between the two states is likely due to transitions that are faster than the time resolution of the experiment. 

      We thank the reviewer for the great comment. Taking the suggestion into consideration, we performed smFRET experiments with a higher time resolution of 20 ms. As a result, we still detected the intermediate state, supporting that it is not an artifact. The new data has been included in the revised manuscript (Figure 2-figure supplement 3).  

      Related to weakness 3:

      - The authors depict the polymerase footprint differently in some of the figures and it is unclear if this is part of their model. Is the cartoon RNAP supposed to indicate the RNA:DNA hybrid or the footprint of T7 RNAP on the RNA? For example, in Figure 8a there are 8 nts (left) and 9 nts (right) covered by RNAP, and only 6nts in Figure 6 - supp 2A. This is particularly misleading for the EC-87 and EC-88 in Figure 6 - supp 2, where it is likely that this stem is not formed at all and the KL strand is single-stranded. The authors should clarify and at least indicate in the figure legend if the RNAP cartoon is part of the model or only a representation. 

      We thank the reviewer for bringing the issues to our attention. Due to space limitations, we chose to represent the polymerase footprint differently in Figure 8. However, we have included the statement “DNA templates from EC-87 to EC-105 are not displayed in the model” in the legend of Figure 8 to avoid the confusion.

      Moreover, we have corrected the error of 6 nts Figure 6-supplement figure 2.  

      - With a correct 9 bp RNA:DNA hybrid, the EC-88 construct would not be able to form the top part of the P2 stem and the second half of the KL RNA would be single-stranded. In this case, an interaction between the KL nucleotides would resemble a pseudoknot and not a kissing loop interaction. Can the authors explain if this could explain the heterogeneity they observe in the EC-88 construct compared to the riboGapt  RNA?

      Thank the reviewer for the comment. We have added the statement in the revised manuscript as “The T7 RNA polymerase (RNAP) sequestered about 8 nt of the nascent RNA, preventing the EC-88 construct from forming the P2 stem (Durniak et al., 2008; Huang & Sousa, 2000; Lubkowska et al., 2011; Tahirov et al., 2002; Wang et al., 2022; Yin & Steitz, 2002). Consequently, a pseudoknot structure potentially formed instead of the expected KL. This distinction may account for the observed heterogeneity between EC-88 and riboG-apt” ( page 11).

      Other comments:

      (1) It appears that the FRET histograms in the PLOR experiments (Figure 6 and related figures) only show the fits presumably to highlight the overlays. However, this makes it impossible to determine the goodness of the fit. The authors should instead show the outline of the raw histogram with the fit, or at least show the raw histograms with fits in the supplement. 

      We have replaced Figure 6- figure supplements 2-4 to enhance the clarity of the raw and fitted smFRET histograms.  

      (2) The authors should consider including a concluding paragraph to put the results into a larger context. How does the kinetic window compare to other transcriptional riboswitches? Would the authors comment on how the transcription speed compares to the kinetics for the formation of the KL? 

      We thank the reviewer for the comment. We have added the comparison of riboG to other transcription riboswitches to the manuscript as “Nevertheless, the ligand-sensitive windows of riboswitches during transcription vary. In a study conducted by Helmling et al. using NMR spectroscopy, they proposed a broad transcriptional window for deoxyguanosine-sensing riboswitches, whereby the ligand binding capability gradually diminishes over several nucleotide lengths (Helmling et al., 2017). However, more recent research by Binas et al. and Landgraf et al. on riboswitches sensing ZMP, c-di-GMP, and c-GAMP revealed a narrow window with a sharp transition in binding capability, even with transcript lengths differing by only one or three nucleotides (Binas et al., 2020; Landgraf et al., 2022). In line with the findings for the c-GAMP-sensing riboswitch, our study on the guanidine-IV riboswitch also demonstrated a sharp transition in binding capability with just a single nucleotide extension” ( page 14). 

      We appreciate the reviewer’s comment in comparing the transcription speed to the kinetics of the KL formation. However, we must acknowledge that we have limited kinetic data in this study to confidently make such a comparison.

      (3) Cy3Cy5 RiboG is a confusing name because it implies that the others are not also Cy3Cy5 labeled. The authors should consider changing the names and being consistent throughout. I suggest full-length riboG or riboG-136. 

      We have changed “Cy3Cy5 riboG” to “Cy3Cy5-full-length riboG” (pages 15 and 16).

      (4) The transcriptional readthrough experiment should be explained when first mentioned in line 109. 

      We have added the citation (Chien et al., 2023) of the transcriptional readthrough experiment to the manuscript as “we noted that the transcriptional read-through of the guanidine-IV riboswitch during the single-round PLOR reaction was sensitive to Gua+, exhibiting an apparent EC50 value of 68.7  7.3 μM (Figure 1D) (Chien et al., 2023)” (page 5). 

      (5) Kd values in text should have uncertainties, and the way these uncertainties are obtained should be explained.

      We have added the uncertainties of Kd values in the revised manuscript ( page 6) and the legend of Figure 2-supplement 6 as “The percentages of the folded state (EFRET ~ 0.8) of Cy3Cy5-riboG-apt were plotted with the concentrations of Gua+ at 0.5 mM Mg2+, with an apparent Kd of 286.0  18.1 μM in three independent experiments”.

      (6) The authors mention "strategies" on line 306, but it is unclear what they are referring to. Are the strategies referring to the constructs (EC-87, etc) or Steps 1-8 in the supplemental figure? Please clarify. 

      We have clarified the confusion by adding “The detailed procedures of strategies 1-8 were shown in Figure 7–figure supplement 1” to the manuscript ( page 12).

      (7) What are the fraction of dynamic traces versus static traces in the cases for the full-length riboG? This would help depict the structural heterogeneity in the population. 

      We have added the fractions of dynamic single-molecule traces of the full-length riboG to Figure 4-supplements 1-5. 

      (8) The labels in Figure 4 (A-E) don't match the caption (A-H). 

      We have corrected the error. 

      (9) The coloring of the RNA strands in Figure 4A should be explained in the figure legend. It could be interpreted as multiple strands annealed instead of a continuous strand. 

      We have revised the legend of Figure 4A by adding “The full-length riboG contains the aptamer domain (black), terminator (red) and the extended sequence (blue). Cy3 and Cy5 are shown by green and red sparkles, respectively”.

      (10) Reported quantities and uncertainties should have the same number of decimal places. In many places, the uncertainties likely have too many significant figures, for example, in Figure 5 and related figures. 

      We have corrected the significant figures of the uncertainties. 

      (11) In Figure 5, A and B should have the same vertical scale to facilitate comparison. 

      We have adjusted Figure 5A to match the vertical scale of Figure 5B in the revised manuscript.

      (12) In Figure 5C-D, the construct from which those trajectories come should be indicated in the legend. 

      We have added the construct to the legend of Figures 5C and D.  

      (13) In Figure 6J, the splines between data points are confusing and can be misleading. They suggest that the data has been fit to a model, but I am not sure if it represents a model. The data points should be colored instead and lines removed. 

      We thank the reviewer for the comment. We have changed Figure 6J by coloring the data points and removing the lines to avoid confusion. 

      (14) Line 330 mentions a P2 structure in Figure 8, but there is no such label in Figure. Please clarify. 

      We thank the reviewer for the comment and have added P2 to Figure 8. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1B. The authors don't seem to address the role of the blue stem-loop following Stems 1 and 2. Is this element needed at all for gene regulation? Does it impact the conformations or folding of the preceding Stems 1 and 2? It seems feasible to disrupt the stem and see whether there is an impact on riboswitch function. 

      We thank the reviewer for the comment. The presence of the sequence which formed blue stem-loop indicates the formation of an anti-terminator conformation in riboG during transcription. Our smFRET data shows that the inclusion of the stem-loop sequence induces additional peaks in the full-length riboG compared to the riboGterm. This indicates that the stem-loop influences the folding of the kissing loop (KL) and potentially also affects the stems 1 and 2.  

      (2) Figure 7 supplement 1, C &D. Maybe I am missing something, but it seems to me in reaction #8 (EC-105, last two lanes), the readthrough percentage is close to 50% based on the gel but plotted in D as 20%. Further, there is a strong effect of guanidine in reaction #8 but that is not reflected in the quantitation in panel D. 

      We thank the reviewer for the comment. The observed discrepancy between reaction 8 in (C) and (D) is from the differential handling of the crude product at the last step (step 17) in gel loading for (C), contrasted with the combination of crude products from steps 16 and 17 to calculate the read-through percentage in (D). We have corrected the discrepancy by replacing Figure 7-Supplement figure 1C (now Figure 7C), and revised the legend to include the following clarification: “Taking into consideration that the 17 step-PLOR reaction exhibited a pause within the terminator region, resulting in a significant amount of terminated product at step 16, crude products from steps 16 and 17 were collected for (C) and (D) of the 17 step-PLOR reaction (Lanes 15 and 16 in C)”.

      (3) Figure 7C is a control that shows the quality of the elongation complexes, which probably should be in the supplement. Instead, in Figure 7 supplement 1, panels C and D are actual experiments and could be moved into the main figure.  

      We thank the reviewer for the comment. We made the adjustment.  

      (4) Figure S7D. I would suggest not labelling the RNA polymerase halt/stoppage sites due to NTP deprivation as "pausing sites" because transcriptional pausing has previously been defined as natural sites where the RNA polymerase transiently halts itself, but not due to the lack of the next NTPs. In this case, the elongating complexes were artificially halted, which is technically not "pausing", as it will not restart/resume on its own without intervention. 

      We have changed the “pausing” to “halting”.  

      (5) Figure 7 is titled "In vitro transcriptional performance of riboG." But the data is actually not about the performance of the riboswitch, or how well it functions. I would suggest the authors revise the title. This is mostly about the observed sensitivity window of the riboswitch to ligand-mediated conformational switching. 

      We have changed the title of Figure 7 to “Ligand-mediated conformational switching of riboG during transcription”.

      (6) Figure 7A, the illustration gives the visual impression that there are multiple RNA polymerases on the same DNA template, which is not the case. 

      We have revised Figure 7A by adding arrows between RNA polymerases to illustrate the movement of a single RNAP, rather than multiple RNAP on the same template.

      (7) It could be informative to compare the guanidine-IV riboswitch with the first three classes (I, II, III), to see how their architectures or gene regulatory mechanisms are similar or different. 

      We thank the reviewer for the comment. We have added the comparison of the guanidine-IV riboswitch to other three guanidine riboswitches to the manuscript as “The guanidine-IV riboswitch exhibits similarities to the guanidine-I riboswitch in gene regulatory mechanism, functioning as a transcriptional riboswitch. Structurally, it resembles the guanidine-II riboswitch through the formation of loop-loop interactions upon binding to guanidine (Battaglia & Ke, 2018; L. Huang et al., 2017; Lin Huang et al., 2017; Lenkeit et al., 2020; Nelson et al., 2017; Reiss & Strobel, 2017; Salvail et al., 2020)” ( page 12).  

      Reviewer #3 (Recommendations For The Authors):

      In addition to the public review items, I provide the following recommendations:

      (1) As a second language speaker, I understand that writing a compelling and concise story may be hard, and we tend to write more than needed or more repetitively. That being said, I do think that the writing could be improved to make it more concise, clear, and avoid repetitions.

      We thank the reviewer for the comment. We re-wrote the abstract and some sentences in the manuscript.

      (2) In the abstract, instead of saying that "...This lack of understanding has impeded the application of this riboswitch", which makes the statement too strong, perhaps, stating something along the lines of "this understanding would assist the application of this riboswitch", would be a better fit. 

      We have re-wrote the abstract, and revised the sentence.  

      (3) Methods should state which RNA polymerase was used. PLOR uses T7 RNA pol, so I assume it was the same. 

      We have added the statement “T7 RNAP was utilized in the PLOR and in vitro transcription reactions except noted” in the Methods ( page 15). 

      (4) The impact statement says comprehensive structure-function, where perhaps comprehensive folding-function would be more appropriate. We are still missing a lot of structural information about this particular riboswitch. 

      We agree with the reviewer, and changed “comprehensive structure-function” to “folding-function” in Impact statement ( page 2).

      (5) Higher Mg2+ concentrations implicated in a lesser extent of the switch of RiboGapt, a sentence talking about it would be useful (how Mg2+ could have promiscuous interaction and interfere with folding). 

      We have added the role of higher Mg2+ to the manuscript as “However, at a higher concentration of 50.0 mM Mg2+, the proportion of the pre-folded and unfolded conformations were more prevalent at 50.0 mM Mg2+ than at 20.0 mM Mg2+. This suggests that an excess of Mg2+ may promote the pre-folded and even unfolded conformations” ( page 6).

      (6) In the investigations of RiboG-term and RiboG, seems like that monovalents from the buffer are sufficient to promote secondary structure. A statement commenting on this would benefit the paper and the audience. 

      We agree with the reviewer and have accordingly revised the manuscript accordingly by adding “This indicates that monovalent ions in the buffer can facilitate the formation of stable guanidine-IV riboswitch” ( page 8).

      (7) Figure 3. Figure goes to panel E and legend to panel H. G and H colors do not correspond to actual figure colors. 

      We made the correction.  

      (8) Figure 4. The same as Figure 3, the panels and figures are divergent.  

      We made the correction.  

      (9) During the discussion, stating that the DNA and RNA pol play a role in folding and ligand binding may be excessive. This could be an indirect effect of the transcriptional bubble hindering part of the nascent RNA from folding, which is something intrinsic to any transcription and not specific to this system. 

      We agree with the reviewer and deleted the statement about the DNA and RNAP play a role in folding and ligand binding.

      (10) PLOR is not properly cited. When introduced in the manuscript, please cite the original PLOR paper (Liu et. al. Nature 2015) and additional related papers. 

      We cited the original PLOR paper (Liu et al, Nature 2015) and the related papers (Liu et al, Nature Protocols 2018). ( pages 4 and 15)

      (11) The kinetics race of folding and binding could be a little more emphasized in discussion, particularly from the perspective of its physiological importance. 

      We agree with the reviewer and deleted the kinetics race of folding and binding from the Discussion part.

    1. eLife assessment

      This study presents a valuable finding on the relationship between brain activity related to sustained attention and substance use in adolescence/early adulthood with a large longitudinal dataset. The evidence supporting the claims of the authors is solid, although the inclusion of more details of methods, results, and data analyses would have strengthened the study. The work will be of interest to cognitive neuroscientists, psychologists, and clinicians working on substance use or addiction.

    2. Reviewer #1 (Public Review):

      This study explored the relationship between sustained attention and substance use from ages 14 to 23 in a large longitudinal dataset. They found behaviour and brain connectivity associated with poorer sustained attention at age 14 predicted subsequent increase in cannabis and cigarette smoking from ages 14-23. They concluded that the brain network of sustained attention is a robust biomarker for vulnerability to substance use. The big strength of the study is a substantial sample size and validation of the generalization to an external dataset. In addition, various methods/models were used to prove the relationship between sustained attention and substance use over time.

    3. Reviewer #2 (Public Review):

      Weng and colleagues investigated the relationship between sustained attention and substance use in a large cohort across three longitudinal visits (ages 14, 19, and 23). They employed a stop signal task to assess sustained attention and utilized the Timeline Followback self-report questionnaire to measure substance use. They assessed the linear relationship between sustained attention-associated functional connections and substance use at an earlier visit (age 14 or 19). Subsequently, they utilized this relationship along with the functional connection profile at a later age (age 19 or 23) to predict substance use at those respective ages. The authors found that connections in association with reduced sustained attention predicted subsequent increases in substance use, a conclusion validated in an external dataset. Altogether, the authors suggest that sustained attention could serve as a robust biomarker for predicting future substance use.

      This study by Weng and colleagues focused on an important topic of substance use prediction in adolescence/early adulthood. While the study largely achieves its aims, several points merit further clarification:

      (1) Regarding connectome-based predictive modeling, an assumption is that connections associated with sustained attention remain consistent across age groups. However, this assumption might be challenged by observed differences in the sustained attention network profile (i.e., connections and related connection strength) across age groups (Figures 2 G-I, Fig. 3 G_I). It's unclear how such differences might impact the prediction results.

      (2) Another assumption of the connectome-based predictive modeling is that the relationship between sustained attention network and substance use is linear, and remains linear over development. Such linear evidence from either the literature or their data would be of help.

      (3) Heterogeneity in results suggests individual variability that is not fully captured by group-level analyses. For instance, Figure 1A shows decreasing ICV (better-sustained attention) with age on the group level, while there are both increasing and decreasing patterns on the individual level via visual inspection. Figure 7 demonstrates another example in which the group with a high level of sustained attention has a lower risk of substance use at a later age compared to that in the group with a low level of sustained attention. However, there are individuals in the high sustained attention group who have substance use scores as high as those in the low sustained attention group. This is important to take into consideration and could be a potential future direction for research.

      The above-mentioned points might partly explain the significant but low correlations between the observed and predicted ICV as shown in Figure 4. Addressing these limitations would help enhance the study's conclusions and guide future research efforts.

    4. Reviewer #3 (Public Review):

      Summary:

      Weng and colleagues investigated the association between attention-related connectivity and substance use. They conducted a study with a sizable sample of over 1,000 participants, collecting longitudinal data at ages 14, 19, and 23. Their findings indicate that behaviors and brain connectivity linked to sustained attention at age 14 forecasted subsequent increases in cigarette and cannabis use from ages 14 to 23. However, early substance use did not predict future attention levels or attention-related connectivity strength.

      Strengths:

      The study's primary strength lies in its large sample size and longitudinal design spanning three time-points. A robust predictive analysis was employed, demonstrating that diminished sustained attention behavior and connectivity strength predict substance use, while early substance use does not forecast future attention-related behavior or connectivity strength.

      Weaknesses:

      It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.

    1. eLife assessment

      This interesting study reports that muscle contains fibro-adipogenic progenitor cells (FAPs) that promote regeneration following injury of peripheral neurons. These novel results indicate that several known growth factors are involved in the process of regeneration. This is an important contribution, however the analysis is incomplete since additional experimental data is needed to support the main conclusions.

    2. Reviewer #1 (Public Review):

      In this manuscript, Yoo et al describe the role of a specialized cell type found in muscle, Fibro-adipogenic progenitors (FAPs), in promoting regeneration following sciatic nerve injury. Using single-cell transcriptomics, they characterize the expression profiles of FAPs at various times after nerve crush or denervation. Their results reveal that a population of these muscle-resident mesenchymal progenitors up-regulate the receptors for GDNF, which is secreted by Schwann cells following crush injury, suggesting that FAPs respond to this growth factor. They also find that FAPs increase expression of BDNF, which promotes nerve regeneration. The authors demonstrate FAP production of BDNF in vivo is upregulated in response to injection of GDNF and that conditional deletion of BDNF in FAPs results in delayed nerve regeneration after crush injury, primarily due to lagging remyelination. Finally, they also find reduced BDNF expression following crush injury in aged mice, suggesting a potential mechanism to explain the decrease in peripheral nerve regenerative capability in aged animals. These results are very interesting and novel and provide important insights into the mechanisms regulating peripheral nerve regeneration, which has important clinical implications for understanding and treating nerve injuries. However, there are a few concerns that the authors need to address.

      Given that only a fraction of the FAPs express BDNF after injury, the authors need to demonstrate the specificity of the Prrx1-Cre for FAPs. This is particularly important because muscle stem cell also express GDNF receptors (Fig. 3C & D) and myogenic progenitors/satellite cells produce BDNF after nerve injury (Griesbeck et al., 1995 (PMID 8531223); Omura et al., 2005 (PMID 16221288)). Moreover, as the authors point out, there are multipotent mesenchymal precursor cells in the nerve that migrate into the surrounding tissue following nerve injury and contribute to regeneration (Carr et al, PMID 30503141). Therefore, there are multiple possible sources of BDNF, highlighting the need to clearly demonstrate that FAP-derived BDNF is essential.

      Similarly, the authors should provide some evidence that BDNF protein is produced by FAPs. All of their data for BDNF expression is based on mRNA expression and that appears to only be increased in a small subset of FAPs. Perhaps an immunostaining could be done to demonstrate up-regulation of BDNF in FAPs after injury.

      The suggestion that Schwann cell-derived GDNF is responsible for up-regulation of BDNF in the FAPs is indirect, based largely on the data showing that injection of GDNF into the muscle is sufficient to up-regulate BDNF (Fig. 4F & G). However, to more directly connect the 2 observations in a causal way, the authors should inject a Ret/GDNF antagonist, such as a Ret-Fc construct, then measure the BDNF levels.

      In assessing the regeneration after nerve crush, the authors focus on remyelination, for example, assessing CMAP and g-ratios. However, they should also quantify axon regeneration, which can be done distal to the crush injury at earlier time points, before the 6 weeks scored in their study. Evaluating axon regeneration, which occurs prior to remyelination, would be especially useful because BDNF can act on both Schwann cells, to promote myelination, and axons, enhancing survival and growth. They could also evaluate the stability of the neuromuscular junctions, particularly if a denervation was done with the conditional knock outs, although that may be a bit beyond the scope of this study.

    3. Reviewer #2 (Public Review):

      Summary:

      Yoo and colleagues studied the cellular mechanism allowing fibro-adipogenic progenitors (FAPs), muscle resident mesenchymal progenitors, to contribute to nerve regeneration upon regenerative injury. In addition to their expected role in the maintenance of muscle tissue, FAPs also contribute to the maturation and maintenance of neural tissue. After nerve injury, they prevent dying back loss of motor neurons. Consistently, muscle denervation activates FAPs, suggesting that FAPs can sense the injured distal peripheral nerve.

      A transcriptomic database was established using flow cytometry protocols and single-cell RNA-seq. FAPs were isolated from sciatic nerve crush (SNC), considered a regenerative condition, and compared to a non-regenerative condition consisting of denervation-affected muscles (DEN) at different time points after injury: early (3 and 7 days post-injury, dpi) and late (14 and 28 dpi), when the regeneration process has started to resolve. Transcriptome changes of the nine different conditions were compared: non-injured, 3, 7, 14, and 28 days after injury. Bioinformatic analysis and other filters were applied, including UMAP plots, hierarchical clustering analysis using differentially expressed genes (DEGs), volcano plots, and RNA velocity analysis. In addition to most of the supplementary material, the first three and a half central figures consist of the analysis of the transcriptome changes comparing the different conditions. Overall, the data indicate similar DEGs after both types of injury at early stages. Still, just after SNC, the gene expression pattern reaches similar levels compared to non-injured, meaning the injured process is resolved. For example, the Interleukin6/Stat3 pathway is upregulated in both injury models but downregulated at 28 days just in SNC. When focusing on the comparison between 28 dpi between both types of injury, it indicates a role of FAPs in the resolution of inflammation in SNC and participation of FAPs in fibrosis and inflammation in DEN at 28 dpi. Genes related to wound healing were enriched in both.

      With the question in mind of how FAPs are sensing injury, the authors identified a subset of FAPs relevant to regeneration in the SNC model. The unsupervised clustering of FAPs cells considering the nine different types of samples resulted in seven clusters of FAPs. Cluster one was exclusive to non-injury animals or regenerated samples. Clusters two and three were exclusive to the early injured or denervated nerve, suggesting that cluster one senses injury and clusters two and three are derived from it. Among the highest DEGs in cluster one were the GDNF receptors Ret and Gfra1. It is known that GDNF is released by Schwann cells after nerve injury in the literature. Also, gene expression analysis in clusters two and three predicts RTK involvement and GDNF signaling. Altogether, transcriptomic data suggest that GDNF is the mechanism by which FAPs sense nerve injury.

      On the other hand, they found BDNF expression limited to cluster two of injured FAPs, suggesting that FAPs respond to GDNF by secreting BDNF. Although the specific role of secreted BDNF by FAPs in nerve regeneration is unknown, BDNF is known to have a regenerative influence on injured sciatic nerves by promoting both axonal growth and myelination. Consistent with their hypothesis, the analysis of gene expression in Schwann cells (sorted using the Plp1CreER Rosatd tomato mouse) and FAPs after injury indicates an initial increase in GDNF gene expression in early time points after injury in Schwann cells, followed by increased expression of BDNF in FAPs. Using conditional knock-out of BDNF in low limb FAPs (Prrx1Cre; Bdnffl/fl), they were able to demonstrate that nerve regeneration is impaired in Prrx1Cre; Bdnffl/fl, by delayed myelinization of axons.

      Strengths:

      I found the article well-written and cleverly maximized the interpretation and analysis of single-cell transcriptome data. Their findings illuminate how growth factors allow communication between cells responding to injury to promote regeneration. I find the data generated by the authors sufficient to support their model and claims,

      Weaknesses:

      Although, I find the data the authors generated enough for their claims. I do see them as relatively poor, and a complementary analysis of protein expression would strengthen the paper through immunostaining of the different genes mentioned for FAPs and Schwann cells. The model is entirely supported by measuring mRNA levels and negative regulation of gene expression in specific cells. Additionally, what happens to the structure of the neuromuscular junction after regeneration when GDNF or BDNF expression is reduced? The determination of decreasing levels of FAPs BDNF mRNA during aging is interesting; is the gain of BDNF expression in FAPs reverting the phenotype?

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Kyusang Yoo et al. "Muscle-resident mesenchymal progenitors sense and repair peripheral nerve injury via the GDNF-BDNF axis" investigates the role and mechanisms of fibro-adipogenic progenitors (FAPs), that are muscle-resident mesenchymal progenitors, in the maturation and maintenance of the neuromuscular system. There is earlier evidence that absence of FAPs or its functional decline with age cause smaller regenerated myofibers. Role of FAPs on peripheral nerve regeneration is very poorly studied. This study has translational importance because traumatic injury to the peripheral nerve can cause lifelong paralysis of the injured limb.

      This manuscript provides data indicating that GDNF-BDNF axis plays an important role in peripheral nerve regeneration and function.

      Strengths:

      Because the role of FAPs on peripheral nerve regeneration is very poorly studied this investigation is a major step towards understanding the mechanism on the role of FAPs. They use scRNA-seq, animal models, and cKO mice that is also important. This study has translational importance because traumatic injury to the peripheral nerve can cause lifelong paralysis of the injured limb.<br /> This is an interesting and original study focusing on the role of FAPs and indicating that GDNF-BDNF axis plays an important role in peripheral nerve regeneration and function.

      Weaknesses:

      In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.<br /> Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells.<br /> There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation In FAP cells.<br /> The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive.

    1. eLife assessment

      Cancer treatments are not just about the tumor - there is an ever-increasing need for treating pain, fatigue, and anhedonia resulting from the disease. Using an implantable oral tumor model in the mouse, the authors provide valuable information showing that nerve fibers are transmitting sensory signals to the brain that reduce pleasure and motivation. These findings are in part supported by anatomical and transcript changes in the tumor that suggest sensory innervation, neural tracing, and neural activity measurements; however, the study is incomplete in its current form.

    2. Reviewer #1 (Public Review):

      Summary:

      Using a mouse model of head and neck cancer, Barr et al show that tumor-infiltrating nerves connect to brain regions via the ipsilateral trigeminal ganglion, and they demonstrate the effect this has on behavior. The authors show that there are neurites surrounding the tumors using a WGA assay and show that the brain regions that are involved in this tumor-containing circuit have elevated Fos and FosB expression and increased calcium response. Behaviorally, tumor-bearing mice have decreased nest building and wheel running and increased anhedonia. The behavior, Fos expression, and heightened calcium activity were all decreased in tumor-bearing mice following nociceptor neuron elimination.

      Strengths:

      This paper establishes that sensory neurons innervate head and neck cancers and that these tumors impact select brain areas. This paper also establishes that behavior is altered following these tumors and that drugs to treat pain restore some but not all of the behavior. The results from the experiments (predominantly gene and protein expression assays, cFos expression, and calcium imaging) support their behavioral findings both with and without drug treatment.

      Weaknesses:

      Study suggests that the effects of their tumor models of mouse behavioral are largely non-specific to the tumor as most behaviors are rescued by analgesic treatment. So, most of the changes were likely due to site-specific pain and not a unique signal from the tumor.

    3. Reviewer #2 (Public Review):

      Summary:

      Cancer treatments are not just about the tumor - there is an ever-increasing need for treating pain, fatigue, and anhedonia resulting from the disease as patients are undergoing successful but prolonged bouts with cancer. Using an implantable oral tumor model in the mouse, Barr et al describe neural infiltration of tumors, and posit that these nerve fibers are transmitting pain and other sensory signals to the brain that reduce pleasure and motivation. These findings are in part supported by anatomical and transcriptional changes in the tumor that suggest sensory innervation, neural tracing, and neural activity measurements. Further, the authors conduct behavior assays in tumor-bearing animals and inhibit/ablate pain sensory neurons to suggest the involvement of local sensory innervation of tumors in mediating cancer-induced malaise.

      Strengths:

      • This is an important area of research that may have implications for improving the quality of life of cancer patients.

      • The studies use a combination of approaches (tracing and anatomy, transcriptional, neural activity recordings, behavior assays, loss-of-function) to support their claims.

      • Tracing experiments suggest that tumor-innervating afferents are connected to brain nuclei involved in oral pain sensing. Consistent with this, the authors observed increased neural activity in those brain areas of tumor-bearing animals. It should be noted that some of these brain nuclei have also been implicated in cancer-induced behavioral alterations in non-head and neck tumor models.

      • Experiments are for the most part well-controlled, and approaches are validated.

      • The paper is well-written and the layout was easy to follow.

      Weaknesses:

      • The main claim is that tumor-infiltrating nerves underlie cancer-induced behavioral alterations, but the experimental interventions are not specific enough to support this. For example, all TRPV1 neurons, including those innervating the skin and internal organs, are ablated to examine sensory innervation of the tumor. Within the context of cancer, behavioral changes may be due to systemic inflammation, which may alter TRPV1 afferents outside the local proximity of tumor cells. A direct test of the claims of this paper would be to selectively inhibit/ablate nerve fibers innervating the tumor or mouth region.

      • Behavioral results from TRPV1 neuron ablation studies are in part confounded by differing tumor sizes in ablated versus control mice. Are the differences in behavior potentially explained by the ablated animals having significantly smaller tumors? The differences in tumor sizes are not negligible. One way to examine this possibility might be to correlate behavioral outcomes with tumor size.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors have tested for and demonstrated a physical (i.e., sensory nerves to the brain) connection between tumors and parts of the brain. This can explain why there is an increase in depressive disorders in HNSCC patients. While connections such as this have been suspected, this is a novel demonstration pointing to sensory neurons that is accompanied by a remarkable amount of complementary data.

      Strengths:

      There is substantial evidence provided for the hypotheses tested. The data are largely quite convincing.

      Weaknesses:

      The authors mention in their Discussion the need for additional experiments. Could they also include / comment on the potential impact on the anti-tumor immune system in their model?

      Minor:

      The authors mention the importance of inflammation contributing to pain in cancer but do not clearly highlight how this may play a role in their model. Can this be clarified?

      The tumor model apparently requires isoflurane injection prior to tumor growth measurements. This is different from most other transplantable types of tumors used in the literature. Was this treatment also given to control (i.e., non-tumor) mice at the same time points? If not, can the authors comment on the impact of isoflurane (if any) in their model?

      The authors emphasize in several places that this is a male mouse model. They mention this as a limitation in the Discussion. Was there an original reason why they only tested male mice?

    1. eLife assessment

      The authors show that short bouts of chemical ischemia lead to presynaptic changes in glutamate release and long-term potentiation, whereas longer bouts of chemical ischemia lead to synaptic failure and presumably cell death (which could be confirmed experimentally). This solid work relies on rigorous electrophysiology/imaging experiments and data analysis. It is valuable as it provides new mechanistic details on chemical ischemia, though its implications for ischemic stroke in vivo remain to be determined.

    2. Reviewer #1 (Public Review):

      Summary:

      This work by Passlick and colleagues set out to reveal the mechanism by which short bouts of ischemia perturb glutamate signalling. This manuscript builds upon previous work in the field that reported a paradoxical increase in synaptic transmission following acute, transient ischemia termed ischemic or anoxic long-term potentiation. Despite these observations, how this occurs and the involvement of glutamate release and uptake mechanisms remains unanswered.

      Here the authors employed two distinct chemical ischemia models, one lasting 2 minutes, the other 5 minutes. Recording evoked field excitatory postsynaptic potentials in acute brain slices, the authors revealed that shorter bouts of ischemia resulted in a transient decrease in postsynaptic responses followed by an overshoot and long-term potentiation. Longer bouts of chemical ischemia (5 minutes), however, resulted in synaptic failure that did not return to baseline levels over 50 minutes of recording (Figure 1).

      Two-photon imaging of fluorescent glutamate sensor iGluSnFR expressed in astrocytes matched postsynaptic responses with shorter ischemia resulting in a transient dip before the increase in extracellular glutamate which was not the case with prolonged ischemia (Figure 2).

      Mechanistically, the authors show that these increased glutamate levels and postsynaptic responses were not due to changes in glutamate clearance (Figure 3). Next using a competitive antagonist for AMPA postsynaptic AMPA receptors the authors show that synaptic glutamate release was enhanced by 2 minute chemical ischemia.

      Taken together, these data reveal the underlying mechanism regarding ischemic long-term potentiation, highlighting presynaptic release as the primary culprit. Additionally, the authors show relative insensitivity of glutamate uptake mechanisms during ischemia, highlighting the resilience of astrocytes to this metabolic challenge.

      Strengths:

      This manuscript uses robust and modern techniques to address the mechanism by which ischemia influences synaptic transmission in the hippocampus.

      The data are of high quality, with adequately powered sample sizes to address their hypotheses.

      Weaknesses:

      The question of the physiological relevance of short bouts of ischemia remains.

      The precise mechanisms underlying the shift between ischemia-induced long-term potentiation and long-term failure of synaptic responses were not addressed. Could this be cell death?

      Sex differences are not addressed or considered.

    3. Reviewer #2 (Public Review):

      Summary:

      To investigate the impact of chemical ischemia induced by blocking mitochondrial function and glycolysis, the authors measured extracellular field potentials, performed whole-cell patch-clamp recordings, and measured glutamate release with optical techniques. They found that shorter two-minute-lasting blockade of energy production initially blocked synaptic transmission but subsequently caused a potentiation of synaptic transmission due to increased glutamate release. In contrast, longer five-minute-lasting blockage of energy production caused a sustained decrease of synaptic transmission. A correlation between the increase of intracellular potassium concentration and the response upon chemical ischemia indicates that the severity of the ischemia determines whether synapses potentiate or depress upon chemical ischemia. A subsequent mechanistic analysis revealed that the speed of uptake of glutamate is unchanged. An increase in the duration of the fiber volley reflecting the extracellular voltage of the action potentials of the axon bundle was interpreted as an action potential broadening, which could provide a mechanistic explanation. In summary, the data convincingly demonstrate that synaptic potentiation induced by chemical ischemia is caused by increased glutamate release.

      Strengths:

      The manuscript is well-written and the experiments are carefully designed. The results are exciting, novel, and important for the field. The main strength of the manuscript is the combination of electrophysiological recordings and optical glutamate imaging. The main conclusion of increased glutamate release was furthermore supported with an independent approach relying on a low-affinity competitive antagonist of glutamate receptors. The data are of exceptional quality. Several important controls were carefully performed, such as the stability of the recordings and the size of the extracellular space. The number of experiments is sufficient for the conclusions. The careful data analysis justifies the classification of two types of responses, namely synaptic potentiation and depression after chemical ischemia. Except for the duration of the presynaptic action potentials (see below weaknesses) the data are carefully discussed and the conclusions are justified.

      Weaknesses:

      The weaknesses are minor and only relate to the interpretation of some of the data regarding the presynaptic mechanisms causing the potentiation of release. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased, which could be due to the action potential broadening of the individual axons but could also be due to differences in conduction velocity. We are therefore skeptical whether the conclusion of action broadening is justified.

    4. Reviewer #3 (Public Review):

      Summary:

      This valuable study shows that shorter episodes (2 minutes duration) of energy depletion, as it occurs in ischemia, could lead to long-lasting dysregulation of synaptic transmission with presynaptic alterations of glutamate release at the CA3-CA1 synapses. A longer duration of chemical ischemia (5 minutes) permanently suppresses synaptic transmission. By using electrophysiological approaches, including field and patch clamp recordings, combined with imaging studies, the authors demonstrated that 2 minutes of chemical ischemia leads to a prolonged potentiation of synaptic activity with a long-lasting increase of glutamate release from presynaptic terminals. This was observed as an increase in iGluSnFR fluorescence, a sensor for glutamate expressed selectively on hippocampal astrocytes by viral injection. The increase in iGluSnFR fluorescence upon 2-minute chemical ischemia could not be ascribed to an altered glutamate uptake, which is unaffected by both 2-minute and 5-minute chemical ischemia. The presynaptic increase in glutamate release upon short episodes of chemical ischemia is confirmed by a reduced inhibitory effect of the competitive antagonist gamma-D-glutamylglycine on AMPA receptor-mediated postsynaptic responses. Fiber volley durations in field recording are prolonged in slices exposed to 2 min chemical ischemia. The authors interpret this data as an indication that the increase in glutamate release could be ascribed to a prolongation of the presynaptic action potential possibly due to inactivation of voltage-dependent K+ channels. However, more direct evidence is needed to support this hypothesis fully. This research highlights an important mechanism by which altered ionic homeostasis underlying metabolic failure can impact on neuronal activity. Moreover, it also showed a different vulnerability of mechanisms involved in glutamatergic transmission with a marked resilience of glutamate uptake to chemical ischemia.

      Strengths:

      (1) The authors use a variety of experimental techniques ranging from electrophysiology to imaging to study the contribution of several mechanisms underlying the effect of chemical ischemia on synaptic transmission.

      (2) The experiments are appropriately designed and clearly described in the figures and in the text.

      (3) The controls are appropriate.

      Weaknesses:

      - The data on fiber volley duration should be supported by more direct measurements to prove that chemical ischemia increases presynaptic Ca2+ influx due to a presynaptic broadening of action potentials. Given the influence that positioning of the stimulating and recording electrode can have on the fiber volley properties, I found this data insufficient to support the assumption of a relationship between increased iGluSnFR fluorescence, action potential broadening, and increased presynaptic Ca2+ levels.

      - The results are obtained in an ex-vivo preparation, it would be interesting to assess if they could be replicated in vivo models of cerebral ischemia.

      Impact:

      This study provides a more comprehensive view of the long-term effects of energy depletion during short episodes of experimental ischemia leading to the notion that not only post-synaptic changes, as reported by others, but also presynaptic changes are responsible for long-lasting modification of synaptic transmission. Interestingly, the direction of synaptic changes is bidirectional and dependent on the duration of chemical ischemia, indicating that different mechanisms involved in synaptic transmission are differently affected by energy depletion.

    1. eLife assessment

      This important work provides interesting datasets of myofiber differentiation. The evidence supporting the involvement of SRF2 in selected biological processes is convincing, however, additional evidence to pin-point the major action of SRF2 during muscle differentiation is appreciated. The work will be of broad interest to developmental biologists in general and molecular biologists in the field of gene regulation.

    2. Reviewer #1 (Public Review):

      Summary

      The work by She et al. investigates the role of SRFS2 in the MyoD+ progenitor cells during development. Deletion of SRFS2 in MyoD+ progenitor cells resulted in a defect in the directional migration of these cells and resulted in the presence of myoD+ progenitor in both nonmuscle and muscle tissues. The authors showed a defect in gene program regulation ECM, cell migration, cytoskeletal organization, and skeletal muscle differentiation by scRNA-seq. The authors further showed that many of these processes are regulated by a downstream target of SRFS2, the serine-threonine kinase Aurka. Finally, the authors showed that SRFS2 acts as a splicing factor and could contribute to differentiation by controlling the splicing of muscle-specific transcripts. This study addresses an important question in skeletal muscle development by focusing on the pathways and factors that regulate the migration of myoD+ progenitors and the impact of this process in skeletal muscle differentiation. This work is interesting but requires experimental evidence to support the findings.

      Strengths

      The regulators of myod+progenitor migration during skeletal muscle development is not completely understood. This work demonstrates that SRFS2 and aura kinase are key players in the process. Combining knockout and reporter lines in mice, the authors perform a detailed analysis of skeletal muscle cells to demonstrate the specific defects in SRFS2 in skeletal muscle development.

      Weaknesses

      This work explores an interesting question on regulating myoD+ progenitors and the defects of this process in skeletal muscle differentiation by SRFS2 but spreads out in many directions rather than focusing on the key defects. A number of approaches are used, but they lack the robust mechanistic analysis of the defects that result in muscle differentiation. Specifically, the role of SRFS2 on splicing appears to be a misfit here and does not explain the primary defects in the migration of myoD+ progenitors. There are concerns about the scRNA-seq and many transcripts in muscle biology that are not expressed in muscle cells. Focusing on main defects and additional experimental evidence to clear the fusion vs. precocious differentiation vs. reduced differentiation will strengthen this work.

      (1) The analysis of RNA-seq data (Figure 2) is limited, and it is unclear how it relates to the work presented in this MS. The Go enrichment analysis is combined for both up and down-regulated DEG, thus making it difficult to understand the impact differently in both directions. Stac2 is a predominant neuronal isoform (while Stac3 is the muscle), and the Symm gene is not found in the HGNC or other databases. Could the authors provide the approved name for this gene? The premise of this work is based on defects in ECM processes resulting in the mis-targeting of the muscle progenitors to the nonmuscle regions. Which ECM proteins are differentially expressed?

      (2) Could authors quantify the muscle progenitors dispersed in nonmuscle regions before their differentiation? Which nonmuscle tissues MyoD+ progenitors are seen? Most of the tDT staining in the enlarged sections appears to be punctate without any nuclear staining seen in these cells (Figure 3 B, D E-F). Could authors provide high-resolution images? Also, in the diaphragm cross-sections in mutants, tdT labeling appears to be missing in some areas within the myofibers defined as cavities by the authors (marked by white arrows, Figure 3H). Could this polarized localization of tDT be contributing to specific defects?

      (3) Is there a difference in the levels of tDT in the myoD" muscle progenitors that are mis-targeted vs the others that are present in the muscle tissues?

      (4) scRNA is unsuitable for myotubes and myofibers due to their size exclusion from microfluidics. Could authors explain the basis for scRNA-seq vs SnRNA-seq in this work? How are SKM defined in scRNA-data in Figure 4? As the myofibers are small in KO, could the increased level of late differentiation markers be due to the enrichment of these small myotubes/myofibers in scRNA? A different approach, such as ISH/IF with the myogenic markers at E9.5-10.5, may be able to resolve if these markers are prematurely induced.

      (5) TNC is a marker for tenocytes and is absent in skeletal muscle cells. The authors mentioned a downregulation of TNC in the KO SKM derived clusters. This suggests a contamination of the tenocytes in the control cells. In spite of the downregulation of multiple ECM genes showed by scRNA-seq data, the ECM staining by laminin in KO in Figure 3 appears to be similar to controls.

      (6) The expression of many fusion genes, such as myomaker and myomerger, is reduced in KO, suggesting a primary fusion defect vs a primary differentiation defect. Many mature myofiber proteins exhibit an increased expression in disease states, suggesting them as a compensatory mechanism. Authors need to provide additional experimental evidence supporting precocious differentiation as the primary defect.

      (7) The fusion defects in KO are also evident in siRNA knockdown for SRSF2 and Aurka in C2C12, which mostly exhibits mononucleated myocytes in knockdowns. Also, a fusion index needs to be provided.

      (8) The last section of the role of SRSF2 on splicing appears to be a misfit in this study. Authors describe the Bin1 isoforms in centronuclear myopathy, but exon17 is not involved in myopathy. Is exon17 exclusion seen in other diseases/ splicing studies?

    3. Reviewer #2 (Public Review):

      Summary:

      This study was aimed to study the role of SRSF2 in governing MyoD progenitors to specific muscle regions. The Results confirmed the role of SRSF2 in controlling myogenic differentiation through the regulation of targeted genes and alternative splicing during skeletal muscle development.

      Strengths:

      The study used different methods and techniques to achieve aims and support the conclusions such as RNA sequencing analysis, Gene Ontology analysis, immunostaining analysis.<br /> This study provides novel findings that SRSF2 controls the myogenic differentiation of MyoD+ progenitors, using transgenic mouse model and in vitro studies.

      Weaknesses:

      Although unbiased sequencing methods were used, their findings about SRSF2 served as a transcriptional regulator and functioned in alternative splicing events are not novel.<br /> The introductions and discussion is not clearly written. The authors did not raise clear scientific questions in the introduction part. The last paragraph is only copy-paste of the abstract. The discussion part is mainly the repeat of their results without clear discussion.

    1. Reviewer #3 (Public Review):

      Summary:

      This study employs an optogenetics approach aimed at activating oncogene (KRASG12V) expression in a single somatic cell, with a focus on following the progression of activated cell to examine tumourigenesis probabilities under altered tissue environments. The research explores the role of stemness factors (VENTX/NANOG/OCT4) in facilitating oncogenic RAS (KRASG12V)-driven malignant transformations. Although the evidence provided are incomplete, the authors propose an important mechanism whereby reactivation of re-programming factors correlates with the increased likelihood of a mutant cell undergoing malignant transformation.

      Strengths:

      · Innovative Use of Optogenetics: The application of optogenetics for precise activation of KRAS in a single cell is valuable to the field of cancer biology, offering an opportunity to uncover insight into cellular responses to oncogenic mutations.<br /> · Important Observations: The findings concerning stemness factors' role in promoting oncogenic transformation are important, contributing data to the field of cancer biology.

      Weaknesses:

      Lack of Methodological Clarity: The manuscript lacks detailed descriptions of methodologies, making it difficult to fully evaluate the experimental design and reproducibility, rendering incomplete evidence to support the conclusion. Improving methodological transparency and data presentation will crucially strengthen the paper's contributions to understanding the complex processes of tumourigenesis.<br /> Sub-optimal Data Presentation and Quality:

      The resolution of images throughout the manuscript are too low. Images presented in Figure 2 and Figure 4 are of very low resolution. It is very hard to distinguish individual cells and in which tissue they might reside.<br /> Lack of quantitative data and control condition data obtained from images of higher magnification limits the ability to robustly support the conclusions.

      Here are some details:<br /> · Tissue specificity of the cells express KRASG12V oncogene: In this study, the ubiquitin promoter was used to drive oncogenic KRASG12V expression. Despite this, the authors claim to activate KRAS in a single brain cell based on their localized photo-activation strategy. However, upon reviewing the methods section, the description was provided that 'Localized uncaging was performed by illumination for 7 minutes on a Nikon Ti microscope equipped with a light source peaking at 405 nm, Figure 1. The size of the uncaging region was controlled by an iris that defines a circular illumination with a diameter of approximately 80 μm.' It is surprising that an epi-fluorescent microscope with an illumination diameter of around 80μm can induce activation in a single brain cell beneath skin tissue. Additionally, given that the half-life for mTFP maturation is around 60 minutes, it is likely that more cells from a variety of different lineages could be activated, but the fluorescence would not be visible until more than 1-hour post-illumination. Authors might want to provide more evidence to support their claim on the single cell KRAS activation.<br /> · Stability of cCYC: The manuscript does not provide information on the half-life and stability of cCYC. Understanding these properties is crucial for evaluating the system's reliability and the likelihood of leakiness, which could significantly influence the study's outcomes.<br /> · Metastatic Dissemination claim: Typically, metastatic cancer cells migrate to and proliferate within specific niches that are conducive to outgrowth, such as the caudal hematopoietic tissue (CHT) or liver. In figure 3 A, an image showing the presence of mTFP expressing cells in both the head and tail regions of the larva, with additional positive dots located at the fin fold. This is interpreted as "metastasis" by the authors. However, the absence of a supportive cellular compartment within the fin-fold tissue makes the presence of mTFP-positive metastatic cells there particularly puzzling. This distribution raises concerns about the spatial specificity of the optogenetic activation protocol.<br /> The unexpected locations of these signals suggest potential ectopic activation of the KRAS oncogene, which could be occurring alongside or instead of targeted activation. This issue is critical as it could affect the interpretation of whether the observed mTFP signal expansion over time is due to actual cell proliferation and infiltration, or merely a result of ectopic RAS transgene activation.<br /> · Image Resolution Concerns: The cells depicted in Figure 3C β, which appear to be near the surface of the yolk sac and not within the digestive system as suggested in the MS, underscore the necessity for higher-resolution imaging. Without clearer images, it is challenging to ascertain the exact locations and states of these cells, thus complicating the assessment of experimental results.<br /> · The cell transplantation experiment is lacking protocol details: The manuscript does not adequately describe the experimental protocols used for cell transplantation, particularly concerning the origin and selection of cells used for injection into individual larvae. This omission makes it difficult to evaluate the reliability and reproducibility of the results. Such as the source of transplanted cells:<br /> • If the cells are derived from hyperplastic growths in larvae where RAS and VX (presumably VENTX) were locally activated, the manuscript fails to mention any use of fluorescence-activated cell sorting (FACS) to enrich mTFP-positive cells. Such a method would be crucial for ensuring the specificity of the cells being studied and the validity of the results.<br /> • If the cells are obtained from whole larvae with induced RAS + VX expression, it is notable and somewhat surprising that the larvae survived up to six days post-induction (6dpi) before cells were harvested for transplantation. This survival rate and the subsequent ability to obtain single cell suspensions raise questions about the heterogeneity of the RAS + VX expressing cells that transplanted.<br /> · Unclear Experimental Conditions in Figure S3B: The images in Figure S3B lack crucial details about the experimental conditions. It is not specified whether the activation of KRAS was targeted to specific cells or involved whole-body exposure. This information is essential for interpreting the scope and implications of the results accurately.<br /> · Contrasting Data in Figure S3C compared to literature: The graph in Figure S3C indicates that KRAS or KRAS + DEX induction did not result in any form of hyperplastic growth. This observation starkly contrasts with previous literature where oncogenic KRAS expression in zebrafish led to significant hyper-proliferation and abnormal growth, as evidenced by studies such as those published in and Neoplasia (2018), DOI: 10.1016/j.neo.2018.10.002; Molecular Cancer (2015), DOI: 10.1186/s12943-015-0288-2; Disease Models & Mechanisms (2014) DOI: 10.1242/dmm.007831. The lack of expected hyperplasia raises questions about the experimental setup or the specific conditions under which KRAS was expressed. The authors should provide detailed descriptions of the conditions under which the experiments were conducted in Figure S3B and clarifying the reasons for the discrepancies observed in Figure S3C are crucial. The authors should discuss potential reasons for the deviation from previous reports.

      Further comments:

      Throughout the study, KRAS-activated cell expansion and metastasis are two key phenotypes discussed that Ventx is promoting. However, the authors did not perform any experiments to directly show that KRAS+ cells proliferate only in Ventx-activated conditions. The authors also did not show any morphological features or time-lapse videos demonstrating that KRAS+ cells are motile, even though zebrafish is an excellent model for in vivo live imaging. This seems to be a missed opportunity for providing convincing evidence to support the authors' conclusions.

      There were minimal experimental details provided for the qPCR data presented in the supplementary figures S5 and S6, therefore, it is hard to evaluate result obtained.

    2. eLife assessment

      This study provides valuable initial characterization of a verterbrate embryonic system that demonstrates aspects of an optogenetically inducible hyperplasia model. Although the evidence provided is incomplete to conclude that the system is demonstrating tumor initiation from a single cell that is metastasizing that can be quantitatively assessed, the authors propose a mechanism whereby reactivation of re-programming factors correlates with the increased likelihood of a mutant cell undergoing malignant transformation. This work will be of interest to developmental and cancer biologists mainly for the novel genetic tools described.

    3. Reviewer #1 (Public Review):

      Scerbo et al. developed an approach based on the oncogene kRasG12V and a reprogramming factor to induce deterministic and reproducible malignant transformation in a single cell. The activation of kRasG12V alone is not sufficient in their hands to initiate carcinogenesis, but when combined with the transient activation of a reprogramming factor (such as Ventx, Nanog, or Oct4), it significantly increases the probability of malignant transformation. This combination of oncogene and reprogramming factor may alter the epigenetic and functional state of the cell, leading to the development of tumors within a short period of time. The use of these two factors allows for the controlled manipulation of a single cell to study the cellular and molecular events involved in the early stages of tumorigenesis. The authors then performed allotransplantations of allegedly single fluorescent TICs in recipient larvae and found a large number of fluorescent cells in distant locations, claiming that these cells have all originated from the single transplanted TIC and migrated away. The number of fluorescent cells showed in the recipient larve just after two days is not compatible with a normal cell cycle length and more likely represents the progeny of more than one transplanted cell. The ability to migrate from the injection site should be documented by time-lapse microscopy. Then, the authors conclude that "By allowing for specific and reproducible single cell malignant transformation in vivo, their optogenetic approach opens the way for a quantitative study of the initial stages of cancer at the single cell level". However, the evidence for these claims are weak and further characterization should be performed to:

      (1) show that they are actually activating the oncogene in a single cell (the magnification is too low and it is difficult to distinguish a single nucleus, labelling of the cell membrane may help to demonstrate that they are effectively activating the oncogene in, or transplanting, a single cell)<br /> (2) the expression of the genes used as markers of tumorigenesis is performed in whole larvae, with only a few transformed cells in them. Changes should be confirmed in FACS sorted fluorescent cells<br /> (3) the histology of the so called "tumor masses" is not showing malignant transformation, but at the most just hyperplasia. In the brain, the sections are not perfectly symmetrical and the increase of cellularity on one side of the optic tectum is compatible with this asymmetry.<br /> (4) The number of fluorescent cells found dispersed in the larve transplanted with one single TIC after 48 hours will require a very fast cell cycle to generate over 50 cells. Do we have an idea of the cell cycle features of the transplanted TICs?

    4. Reviewer #2 (Public Review):

      Summary:

      In the work by Scerbo et al, the authors aim to better understand the open question of what factors constrain cells that are genetically predisposed to form cancer (e.g. those with a potentially cancer-causing mutation like activated Ras) to only infrequently undergo this malignant transformation, with a focus on the influence of embryonic or pluripotency factors (e.g. VENTX/NANOG). Using genetically defined zebrafish models, the authors can inducibly express the KRASG12V oncogene using a combination of Cre/Lox transgenes further controlled by optogenetically inducible Cre-activated (CreER fusion that becomes active with light-induced uncaging of a tamoxifen-analogue in a targeted region of the zebrafish embryo). They further show that transient expression and activation of a pluripotency factor (e.g. Ventx fused to a GR receptor that is activated with addition of dexamethasone) must occur in the model in order for overgrowth of cells to occur. This paper describes a genetically tractable and modifiable system for studying the requirements for inducing cellular hyperplasia in a whole organism by combining overexpression of canonical genetic drivers of cancer (like Ras) with epigenetic modifiers (like specific transcription factors), which could be used to study an array of combinations and temporal relationships of these cancer drivers/modifiers.

      Strengths:

      The combination of Cre/lox inducible gene expression with potentially localized optogenetic induction (CreER and uncaging of tamoxifen analogues) of recombination as well as well inducible activation of a transcription factor expressed via mRNA injection (GR-fusion to the TF and dex induction) offers a flexible system for manipulating cell growth, identity, and transcriptional programs. With this system, the authors establish that Ras activation and at least transient Ventx overexpression are together required to induce a hyperproliferative phenotype in zebrafish tissues.

      The ability to live image embryos over the course of days with inducible fluorophores indicating recombination events and transgene overexpression offers a tractable in vivo system for studying hyperplastic cells in the context of a whole organism.

      The transplant experiments demonstrate the ability of the induced hyperplastic cells to grow upon transfer to new host.

      Weaknesses:

      There is minimal quantitation of key aspects of the system, most critically in the efficiency of activation of the Ras-TFP fusion (Fig 1) in, purportedly, a single cell. The authors note "On average the oncogene is then activated in a single cell, identified within ~1h by the blue fluorescence of its nuclear marker) but no additional quantitative information is provided. For a system that is aimed at "a statistically relevant single-cell<br /> tracking and characterization of the early stages of tumorigenesis", such information seems essential.

      The authors indicate that a single cell is "initiated" (Fig 2) using the laser optogenetic technique, but without definitive genetic lineage tracing, it is not possible to conclude that cells expressing TFP distant from the target site near the ear are daughter cells of the claimed single "initiated" cell. A plausible alternative explanation is 1) that the optogenetic targeting is more diffuse (i.e. some of the light of the appropriate wavelength hits other cells nearby due to reflection/diffraction), so these adjacent cells are additional independent "initiated" cells or 2) that the uncaged tamoxifen analogue can diffuse to nearby cells and allow for CreER activation and recombination. In Fig 2B, the claim is made that "the activated cell has divided, giving rise to two cells" - unless continuously imaged or genetically traced, this is unproven. In addition, it appears that Figures S3 and S4 are showing that hyperplasica can arise in many different tissues (including intestine, pancreas, and liver, S4C) with broad Ras + Ventx activation (while unclear from the text, it appears these embryos were broadly activated and were not "single cell activated using the set-up in Fig 1E? This should be clarified in the manuscript). In Fig S7 where single cell activation and potential metastasis is discussed, similar gut tissues have TFP+ cells that are called metastatic, but this seems consistent with the possibility that multiple independent sites of initiation are occurring even when focal activation is attempted.

      Although the hyperplastic cells are transplantable (Fig 4), the use of the term "cells of origin of cancer" or metastatic cells should be viewed with care in the experiments showing TFP+ cells (Fig 1, 2, 3) in embryos with targeted activation for the reasons noted above.

    1. eLife assessment

      This study describes the application of machine learning and Markov state models to characterize the binding mechanism of alpha-Synuclein to the small molecule Fasudil. The results suggest that entropic expansion can explain such binding. However, the simulations and analyses in their present form are inadequate.

    2. Reviewer #1 (Public Review):

      Summary:

      This is a well-conducted study about the mechanism of binding of a small molecule (fasudil) to a disordered protein (alpha-synuclein). Since this type of interaction has puzzled researchers for the last two decades, the results presented are welcome as they offer relevant insight into the physical principles underlying this interaction.

      Strengths:

      The results show convincingly that the mechanism of entropic expansion can explain the previously reported binding of fasudil to alpha-synuclein. In this context, the analysis of the changes in the entropy of the protein and of water is highly relevant. The combination use of machine learning for dimensional reduction and of Markov State Models could become a general procedure for the analysis of other systems where a compound binds a disordered protein.

      Weaknesses:

      It would be important to underscore the computational nature of the results, since the experimental evidence that fasudil binds alpha-synuclein is not entirely clear, at least to my knowledge.

    3. Reviewer #2 (Public Review):

      The manuscript by Menon et al describes a set of simulations of alpha-Synuclein (aSYN) and analyses of these and previous simulations in the presence of a small molecule.

      While I agree with the authors that the questions addressed are interesting, I am not sure how much we learn from the present simulations and analyses. In parts, the manuscript reads more like an attempt to apply a whole range of tools rather than with a goal of answering any specific questions.

      There's a lot going on in this paper, and I am not sure it is useful for the authors, readers or me to spell out all of my comments in detail. But here are at least some points that I found confusing/etc

      Major concerns

      p. 5 and elsewhere:<br /> I lack a serious discussion of convergence and the statistics of the differences between the two sets of simulations. On p. 5 it is described how the authors ran multiple simulations of the ligand-free system for a total of 62 µs; that is about 25 times less than for the ligand system. I acknowledge that running 1.5 ms is unfeasible, but at a bare minimum the authors should discuss and analyse the consequences for the relatively small amount of sampling. Here it is important to say that while 62 µs may sound like a lot it is probably not enough to sample the relevant properties of a 140-residue long disordered protein.

      p. 7:<br /> The authors make it sound like a bad thing than some methods are deterministic. Why is that the case? What kind of uncertainty in the data do they mean? One can certainly have deterministic methods and still deal with uncertainty. Again, this seems like a somewhat ad hoc argument for the choice of the method used.

      p. 8:<br /> The authors should make it clear (i) what the reconstruction loss and KL is calculated over and (ii) what the RMSD is calculated over.

      p. 9/figure 1:<br /> The authors select a beta value that may be the minimum, but then is just below a big jump in the cross-validation error. Why does the error jump so much and isn't it slightly dangerous to pick a value close to such a large jump.

      p. 10:<br /> Why was a 2-dimensional representation used in the VAE? What evidence do the authors have that the representation is meaningful? The authors state "The free energy landscape represents a large number of spatially close local minima representative of energetically competitive conformations inherent in αS" but they do not say what they mean by "spatially close". In the original space? If so, where is the evidence.

      p. 10:<br /> It is not clear from the text whether the VAEs are the same for both aSYN and aSYN-Fasudil. I assume they are. Given that the Fasudil dataset is 25x larger, presumably the VAE is mostly driven by that system. Is the VAE an equally good representation of both systems?

      p. 10/11:<br /> Do the authors have any evidence that the latent space representation preserves relevant kinetic properties? This is a key point because the entire analysis is built on this. The choice of using z1 and z2 to build the MSM seems somewhat ad hoc. What does the auto-correlation functions of Z1 and Z2 look like? Are the related to dynamics of some key structural properties like Rg or transient helical structure.

      p. 11:<br /> What's the argument for not building an MSM with states shared for aSYN +- Fasudil?

      p. 12:<br /> Fig. 3b/c show quite clearly that the implied timescales are not converged at the chosen lag time (incidentally, it would have been useful with showing the timescales in physical time). The CK test is stated to be validated with "reasonable accuracy", though it is unclear what that means.

      p. 12:<br /> In Fig. 3d, what are the authors bootstrapping over? What are the errors if the authors analyse sampling noise (e.g. bootstrap over simulation blocks)?

      p. 13:<br /> I appreciate that the authors build an MSM using only a subset of the fasudil simulations. Here, it would be important that this analysis includes the entire workflow so that the VAE is also rebuilt from scratch. Is that the case?

      p. 18:<br /> I don't understand the goal of building the CVAE and DCVAE. Am I correct that the authors are building a complex ML model using only 3/6 input images? What is the goal of this analysis. As it stands, it reads a bit like simply wanting to apply some ML method to the data. Incidentally, the table in Fig. 6C is somewhat intransparent.

      p. 22:<br /> "Our results indicate that the interaction of fasudil with αS residues governs the structural features of the protein."<br /> What results indicate this?

      p. 23:<br /> The authors should add some (realistic) errors to the entropy values quoted. Fig. 8 have some error bars, though they seem unrealistically small. Also, is the water value quoted from the same force field and conditions as for the simulations?

      p. 23:<br /> Has PDB2ENTROPY been validated for use with disordered proteins?

      p. 23/24:<br /> It would be useful to compare (i) the free energies of the states (from their populations), (ii) the entropies (as calculated) and (iii) the enthalpies (as calculated e.g. as the average force field energy). Do they match up?

      p. 31:<br /> It is unclear which previous simulation the new aSYN simulations were launched from. What is the size of the box used?

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript Menon, Adhikari, and Mondal analyze explicit solvent molecular dynamics (MD) computer simulations of the intrinsically disordered protein (IDP) alpha-synuclein in the presence and absence of a small molecule ligand, Fasudil, previously demonstrated to bind alpha-synuclein by NMR spectroscopy without inducing folding into more ordered structures. In order to provide insight into the binding mechanism of Fasudil the authors analyze an unbiased 1500us MD simulation of alpha-synuclein in the presence of Fasudil previously reported by Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510). The authors compare this simulation to a very different set of apo simulations: 23 separate1-4us simulations of alpha-synuclein seeded from different apo conformations taken from another previously reported by Robustelli et. al. (PNAS, 115 (21), E4758-E4766), for a total of ~62us.

      To analyze the conformational space of alpha-synuclein - the authors employ a variational auto-encoder (VAE) to reduce the dimensionality of Ca-Ca pairwise distances to 2 dimensions, and use the latent space projection of the VAE to build Markov state Models. The authors utilize k-means clustering to cluster the sampled states of alpha-synuclein in each condition into 180 microstates on the VAE latent space. They then coarse grain these 180 microstates into a 3-macrostate model for apo alpha-synuclein and a 6-macrostate model for alpha-synuclein in the presence of fasudil using the PCCA+ course graining method. Few details are provided to explain the hyperparameters used for PCCA+ coarse graining and the rationale for selecting the final number of macrostates.

      The authors analyze the properties of each of the alpha-synuclein macrostates from their final MSMs - examining intramolecular contacts, secondary structure propensities, and in the case of alpha-synuclein:Fasudil holo simulations - the contact probabilities between Fasudil and alpha-synuclein residues.

      The authors utilize an additional variational autoencoder (a denoising convolutional VAE) to compare denoised contact maps of each macrostate, and project onto an additional latent space. The authors conclude that their apo and holo simulations are sampling distinct regions of the conformational space of alpha-synuclein projected on the denoising convolutional VAE latent space.

      Finally, the authors calculate water entropy and protein conformational entropy for each microstate. To facilitate water entropy calculations - the author's take a single structure from each macrostate - and ran a 20ps simulation at a finer timestep (4 femtoseconds) using a previously published method (DoSPT), which computes thermodynamic properties of water from MD simulations using autocorrelation functions of water velocities. The authors report that water entropy calculated from these individual 20ps simulations is very similar.

      For each macrostate the authors compute protein conformational entropy using a previously published Maximum Information Spanning tree approach based on torsion angle distributions - and observe that the estimated protein conformational entropy is substantially more negative for the macrostates of the holo ensemble.

      The authors calculate mean first passage times from their Markov state models and report a strong correlation between the protein conformational entropy of each state and the mean first passage time from each state to the highest populated state.

      As the authors observe the conformational entropy estimated from macrostates of the holo alpha-synuclein:Fasudil is greater than those estimated from macrostates of the apo holo alpha-synuclein macrostates - they suggest that the driving force of Fasudil binding is an increase in the conformational entropy of alpha-synuclein. No consideration/quantification of the enthalpy of alpha-synuclein Fasudil binding is presented.

      Strengths:

      The author's utilize MD simulations run with an appropriate force field for IDPs (a99SB-disp and a99SB-disp water (Robustelli et. al, PNAS, 115 (21), E4758-E4766) - which has previously been used to perform MD simulations of alpha-synuclein that have been validated with extensive NMR data.

      The contact probability between Fasudil and each alpha-synuclein residue observed in the previously performed 1500us MD simulation of alpha-synuclein in the presence of Fasudil (Robustelli et. al., Journal of the American Chemical Society, 144(6), pp.2501-2510) was previously found to be in good agreement with experimental NMR chemical shift perturbations upon Fasudil binding - suggesting that this simulation is a reasonable choice for understanding IDP:small molecule interactions.

      Weaknesses:

      Major Weakness 1: Simulations of apo alpha-synuclein and holo simulations of alpha-synuclein and fasudil are not comparable.

      The most robust way to determine how presence of Fasudil affects the conformational ensemble of alpha-synuclein conclusions is to run apo and holo simulations of the same length from the same starting structures using the same simulation parameters.

      The 23 1-4 us independent simulations of apo alpha-synuclein and the long unbiased 1500us alpha-synuclein in the presence of fasudil are not directly comparable. The starting structures of simulations used to build a Markov state model to describe apo alpha-synuclein were taken from a previously reported 73us MD simulation of alpha-synuclein run with the a99SB-disp force field and water model) with 100mM NaCl, (Robustelli et. al, PNAS, 115 (21), E4758-E4766). As the holo simulation of alpha-synuclein and Fasudil was run in 50mM NaCl, snapshots from the original apo alpha-synuclein simulation were resolvated with 50mM NaCl - and new simulations were run.

      No justification is offered for how starting structures were selected. We have no sense of the conformational variability of the starting structures selected and no sense of how these conformations compare to the alpha-synuclein conformations sampled in the holo simulation in terms of standard structural descriptors such as tertiary contacts, secondary structure, radius of gyration (Rg), solvent exposed surface area etc. (we only see a comparison of projections on an uninterpretable non-linear latent-space and average contact maps). Additionally, 1-4 us is a relatively short timescale for a simulation of a 140 residue IDP- and one is unlikely to see substantial evolution for many structural properties of interest (ie. secondary structure, radius of gyration, tertiary contacts) in simulations this short. Without any information about the conformational space sample in the 23 apo simulations (aside from a projection on an uninterpretable latent space)- we have no way to determine if we observe transitions between distinct states in these short simulations, and therefore if it is possible the construct a meaningful MSM from these simulations.

      If the structures used for apo simulations are on average more compact or contain more tertiary contacts - then it is unsurprising that in short independent simulations they sample a smaller region of conformational space. Similarly, if the starting structures have similar dimensions - but we only observe extremely local sampling around starting structures in apo simulations in the short simulation times - it would also not be surprising that we sample a smaller amount of conformational space. By only presenting comparisons of conformational states on an uninformative VAE latent space - it is not possible for a reader to ask simple questions about how the conformational ensembles compare.

      It is noted that the authors attempt to address questions about sampling by building an MSM of single contiguous 60us portion of the holo simulation of alpha-synuclein and Fasudil - noting that:

      "the MSM built using lesser data (and same amount of data as in water) also indicated the presence of six states of alphaS in presence of fasudil, as was observed in the MSM of the full trajectory. Together, this exercise invalidates the sampling argument and suggests that the increase in the number of metastable macrostates of alphaS in fasudil solution relative to that in water is a direct outcome of the interaction of alphaS with the small molecule."

      However, the authors present no data to support this assertion - and readers have no sense of how the conformational space sampled in this portion of the trajectory compares to the conformational space sampled in the independent apo simulations or the full holo simulation. As the analyzed 60us portion of the holo trajectory may have no overlap with conformational space sampled in the independent apo simulations - it is unclear if this control provides any information. There is no quantification of the conformational entropy of the 6 states obtained from this portion of the holo trajectory or the full conformational space sampled. No information is presented to determine if we observe similar states in the shorter portion of the holo trajectory. Furthermore - as the authors provide almost no justification for the criteria used to select of the final number of macrostates for any of the MSMs reported in this work- and the number of macrostates is effectively a free parameter in the PCCA+ method, arriving at an MSM with 6 macrostates does not convey any information about the conformational entropy of alpha-synuclein in the presence or absence of ligands. Indeed - the implied timescale plot for 60us holo MSM (Figure S2) - shows that at least 10 processes are resolved in the 120 microstate model - and there is no information to provided explaining/justifying how a final 6-macrostate model was determined. The authors also do not project the conformations sampled in this sub- trajectory onto the latent space of the final VAE.

      One certainly expects that an MSM built with 1/20th of the simulation data should have substantial differences from an MSM built from the full trajectory - so failing additional information and hyperparameter justification - one wonders if the emergence of a 6-state model could be the direct result of hardcoded VAE and MSM construction hyperparameter choices.

      Required Controls For Supporting the Conclusions of the Study: The authors should initiate apo and holo simulations from the same starting structures - using the same simulation software and parameters. This could be done by adding a Fasudil ligand to the apo structures - or by removing the Fasudil ligand from a subset of holo structures. This would enable them to make apples-to-apples comparisons about the effect of Fasudil on alpha-synuclein conformational space.

      Failing to add direct apples-to-apples comparisons, which would be required to truly support the studies conclusions, the authors should at least compare the conformational space sampled in the independent apo simulations and holo simulations using standard interpretable IDP order parameters (ie. Rg, end-to-end distance, secondary structure order parameters) and/or principal components from PCA or tICA obtained from the holo simulation. The authors should quantify the number of transitions observed between conformational states in their apo simulations. The authors could also perform more appropriate holo controls, without additional calculations, by taking batches of a similar number of short 1-4us segments of simulations used to compute the apo MSMs and examining how the parameters/macrostates of the holo MSMs vary with the input with random selections.

      Major Weakness 2: There is little justification of how the hyperparameters MSMs were selected. It is unclear if the results of the study depend on arbitrary hyperparameter selections such as the final number of macrostates in each model.

      It is unclear what criteria were used to determine the appropriate number of microstates and macrostates for each MSM. Most importantly - as all analyses of water entropy and conformational entropy are restricted to the final macrostates - the criteria used to select the final number of macrostates with the PCCA+ are extremely important to the results of the conclusions of the study. From examining the ITS plots in Figure 3 - it seems both MSMs show the same number of resolved processes (at least 11) - suggesting that a 10-state model could be apropraite for both systems. If one were to simply select a large number of macrostates for the 20x longer holo simulation - do these states converge to the same conformational entropy as the states seen in the short apo simulations? Is there some MSM quality metric used to determine what number of macrostates is more appropriate?

      Required Controls For Supporting the Conclusions of the Study: The authors should specify the criteria used to determine the appropriate number of microstates and macrostates for their MSMs and present controls that demonstrate that the conformational entropies calculated for their final states are not simply a function of the ratio of the number macrostates chosen to represent very disparate amounts of conformational sampling.

      Major Weakness 3: The use of variational autoencoders (VAEs) obscures insights into the underlying conformational ensembles of apo and holo alpha-synuclein rather than providing new ones.

      No rationale is offered for the selection of the VAE architecture or hyperparameters used to reduce the dimensionality of alpha-synuclein conformational space.

      It is not clear the VAEs employed in this study are providing any new insight into the conformational ensembles and binding mechanisms of Fasudil to alpha-synuclein, or if the underlying latent space of the VAEs are more informative or kinetically meaningful than standard linear dimensionality reduction techniques like PCA and tICA. The initial VAE is used to reduce the dimensionality of alpha-synuclein conformational ensembles to 2 degrees of freedom - but it is unclear if this projection is structurally or kinetically meaningful. It is not clear why the authors choice to use a 2-dimeinsional projection instead of a higher number of dimensions to build their MSMs. Can they produce a more kinetically and structurally meaningful model using a higher dimensional VAE latent space?

      Additionally - it is not clear what insights are provided by the Denoising Convolutional Variational Autoencoder. The authors appear to be noising-and-denoising the contact maps of each macrostate, and then projecting the denoised values onto a new latent space - and commenting that they are different. Does this provide additional insight that looking at the contact maps in Figures 4&5 does not? Is this more informative than examining the distribution of the Radii of gyration or the secondary structure propensities of each ensemble? It is not clear what insight this analysis adds to the manuscript.

      Suggested controls to improve the study: The authors should project interpretable IDP structural descriptors (ie. secondary structure, radius of gyration, secondary structure content, # of intramolecular contacts, # of intermolecular contacts between alpha-synuclein and Fasudil ) onto this latent space to illustrate if any of these properties are meaningful separated by the VAE projection. The authors should compare these projections, and MSMs built from these projections, to projections and MSMs built from projections using standard linear dimensionality projection techniques like PCA and tICA.

      Major Weakness 4: The MSMs produced in this study have large discrepancies with MSMs previously produced on the same dataset by the same authors that are not discussed.

      Previously - two of the authors of this manuscript (Menon and Mondal) authored a preprint titled "Small molecule modulates α-synuclein conformation and its oligomerization via Entropy Expansion" (https://www.biorxiv.org/content/10.1101/2022.10.20.513005v1.full) that analyzed the same 1500us holo simulation of alpha-synuclein binding Fasudil. In this study - they utilized the variational approach to Markov processes (VAMP) to build an MSM using a 1D order parameter as input (the radius of gyration), first discretizing the conformational space into 300 microstates before similarly building a 6 macrostate model. From examining the contact maps and secondary structure propensities of the holo MSMs from the current study and the previous study- some of the macrostates appear similar, however there appear to be orders of magnitude differences in the timescales of conformational transitions between the two models. The timescales of conformational transitions in the previous MSM are on the order of 10s of microseconds, while the timescales of transitions in this manuscript are 100s-1000s microseconds. In the previous manuscript, a 3 state MSM is built from an apo α-synuclein obtained from a continuous 73ms unbiased MD simulation of alpha-synuclein run at a different salt concentration (100mM) and an additional 33 ms of shorter simulations. The apo MSM from the previous study similarly reports very fast timescales of transitions between apo states (on the order ~1ms) - while the MSM reported in the current study (Figure 9) are on the order of 10s-100s of microseconds).

      These discrepancies raise further concerns that the properties of the MSMs built on these systems are extremely sensitive to the chosen projection methods and MSM modeling choices and hyperparameters, and that neither model may be an accurate description of the true underlying dynamics

      Suggestions to improve the study: The authors should discuss the discrepancies with the MSMs reported in their previous studies.

    1. eLife assessment

      This valuable study establishes a method for live-cell imaging, tracking, and quantification of Alu elements marking euchromatic regions of the nucleus. The method will help characterize the relationship between chromatin dynamics and transcriptional activity. While the findings are largely consistent with previous reports, characterization of the technique is incomplete and could benefit from additional controls.

    2. Reviewer #1 (Public Review):

      The manuscript from Chang et al. presents a new technique to track chromatin locus mobility in live cells, by specifically tracking Alu rich sequences using a CRISPR based technique. The experiments in Fig. 1-2 provide extensive validation of the reagent, and the experiments in Figs. 3-4 yield new insights into chromatin dynamics and its relationship to transcription. While the findings in this manuscript are interesting, some points need to be addressed to support the central claims.

      One item of consideration is the use of bulk PIV methods to monitor chromatin mobility. While these whole genome methods certainly are useful for studying chromatin mobility at a diffraction limited (or higher scale) as well as tracking correlations at the micron scale, these methods obscure dynamics at the TAD/nucleosomal level (~200 nm). Since the studies use fluorescently labeled H2B to study chromatin dynamics, some consideration should be given to using Halo-tagged variants of H2B to get a single molecule view within specific chromatin contexts. A few recent studies (Saxton et al. 2023, Daugird et al. 2023) have used these methods to show how histone dynamics at the single molecule level depends on the chromatin context.

      Secondly, there should be additional discussion of how the mean-squared network displacement relates to single locus and histone mobility at the sub-diffraction level. While it is reassuring to see that MSND and single particle tracking MSD exponents roughly agree at the sub-second time scale, how these relate at longer time scales is not clear. Figure S5A shows MSD for individual loci, but only timelags upto 1s are shown. It should be possible to track loci considerably longer than that. MSD exponents in the literature are quite varied beyond the second time-scale, and the authors have an excellent system to shed light on this question.

      Finally, some additional discussion about why the transcriptional inhibition results shown here differ from other studies in the literature (e.g. Daugird et al. 2023) would better place these findings in context.

    3. Reviewer #2 (Public Review):

      Summary:

      Chromatin organization and dynamics are critical for eukaryotic genome functions, but how are they related to each other? To address this question, Chang et al. developed a euchromatic labeling method using CRISPR/dCAS9 targeting Alu elements. These elements are highly enriched in the A compartment, which is closely associated with transcriptionally active and gene-rich regions. Labeling Alu elements allowed live-cell imaging of the gene-rich A compartment (euchromatin). Using the developed system, Chang et al. found while Alu-rich chromatin is depleted in regions with high chromatin density (putative heterochromatin), Alu density and chromatin density are not correlated in the euchromatin. Combining the live-cell imaging of Alu elements with bulk chromatin labeling (fluorescent histone H2B), the authors showed that transcriptionally active chromatin (A compartment) has an increased mobility. Transcription inhibitors flavopiridol and 𝛼-amanitin treatments increased the mobility of Alu-rich chromatin, and ActD had the opposite effect on chromatin mobility.

      Strengths:

      Alu labeling is a valuable euchromatin labeling method, and measuring its mobility would contribute to a comprehensive understanding of the relationship between chromatin dynamics and transcriptional activity.

      Weaknesses:

      Some of the findings are consistent with the previous reports and not new. There are some issues to be addressed. My specific comments are the following:

      Line 58. "these methods generally lack information regarding the local chromatin environment (e.g., epigenetic state) and genomic context (e.g., A/B compartments and TADs)." This description is not accurate because Nozaki et al. (2023) performed euchromatin-specific nucleosome labeling/imaging (Hi-C contact domains with active histone marks, A-compartment). More recently, Semeigazin et al. (2024)(https://www.researchsquare.com/article/rs-3953132/v1) also did euchromatic-specific nucleosome labeling/imaging in living cells.

      Line 154. "we defined the euchromatin regions in our images by excluding heterochromatin (top 5% pixel intensity) and nucleolar areas."<br /> I am not so sure that this definition is reasonable. How were the top 5% H2B intensity regions distributed? Did they include the nuclear periphery region, which is also heterochromatin-rich? Could the authors show the ΔPCC between whole H2B (including both euchromatin and heterochromatin) and dCas9-sgAlu?

      Line 214. "our data suggests that Alu-rich (gene-rich) regions have increased chromatin mobility compared to Alu-poor (gene-poor) regions." A similar finding on nucleosome motion has already been published by Nozaki et al. 2023 and Semeigazin et al. 2024 (described above).

      Line 282. A recent important paper on the relationship between histone acetylation, transcription initiation, and nucleosome mobility (PMID: 37792937) is missing and should be discussed.

      Line 303. "Alu-rich chromatin may be more sensitive upon flavopiridol and 𝛼-amanitin treatments compared to Alu-poor chromatin (Figure 5)." Nagashima et al. (2019) also revealed that 𝛼-amanitin treatment did not increase the chromatin dynamics in heterochromatin-rich nuclear periphery regions.

    4. Reviewer #3 (Public Review):

      The manuscript by Chang, Quinodoz and Brangwynne describes the results of live cell imaging of fluorescently labeled Alu element genomic sites in combination with H2B-GFP marked chromatin in human cancer cells. The study includes dCas9 based genomic engineering for Suntag enhanced Alu element labeling. The motion of Alu elements and chromatin was analyzed in real time at 500 ms intervals over 1 min at high resolution. Advanced image analysis algorithms were developed.

      The main objective of the study is to understand how motion of euchromatin or active chromatin relates to chromatin density. Alu elements, which are spread throughout the genome are used as a proxy for euchromatin or also A compartments. The study finds that Alu-rich chromatin is more mobile than Alu poor one and that actinomycin but not flavopyridol or alpha amanitin cause some decrease in the determined mobility. The authors emphasize the heterogeneity of motion, Alu clustering and chromatin density underscoring the complexity of the problem.

      Although the topic is important and the imaging well performed, the study lacks depth and does not provide any truly new insights into our understanding of the link between genome activity and mobility nor diffusive behavior of the chromatin fiber in situ. Although the approach to record context dependent dynamics based on segmentation of pixels of varying intensity is elegant, the analysis of the trajectories requires further explanation and justification to be able to interpret the results. Important information on the biology of the engineered cell lines is lacking. Presented results are not discussed with respect to existing literature and knowledge.

      Major concerns:<br /> - Are Alu elements a good proxy for A compartments? What consequences do massive dCas9 tags have on the genome and the engineered cells? How does the bulky dCas9-Suntag label impact mobility and transcription of Alu elements themselves? How many off target sites are potentially labeled?

      (1) The authors should state the size of the dCas9-Suntag construct and perform FRAP analysis to compare the tag's behavior to the one of H2B-GFP<br /> (2) dCas9 locally unwinds DNA and is strongly bound to its target sequence impeding polymerase progression.<br /> (3) The authors need to check if DNA breaks are induced. An immunofluorescence using a gH2AX antibody is a minimum in all conditions tested. DNA breaks largely affect chromatin mobility which is a topic of major debate (see PMC5769766, PMID33061931).<br /> (4) The authors need to confirm that in dCas/sgAlu cells Alu elements are still transcribed similarly to wt cells (transcriptome or at least some qPCR).<br /> (5) Please compare H2B-GFP mobility of sgAlu tagged and untagged cells.<br /> (6) Figure 1D shows significant background in the Cut&run sgAlu line compared to H3K4me3 line. Are these off target sites? Was the H3K4me3 Cut&run performed in the engineered cell line? Did the authors test another guide RNA? Non-specific binding could also contribute to the observed heterogeneity in the measured dynamics.<br /> (7) Figure 3G shows that H2B MSND at tau=5s is high for high H2B density independently of Alu density questioning the value of using Alu sg tagging as a proxy for euchromatin.

      - What are the physical principles of the measured motion? What is the rationale for the MSND analyses deployed in this study?<br /> (1) Please provide the equation used for MSND (seems to be different from the standard MSD one).<br /> (2) Figure 3: all MSD curves have a slope suggesting an alpha exponent significantly smaller than 0.5 reminiscent of subdiffusion (example panels A and E compare thick line to slope of the triangle bottom right). Please explain. Is it gaussian noise? Confinement? This was seen before for faster acquisition rates, but still requires explanation and interpretation.<br /> (3) What is the rationale for choosing the value at τ =5 s? Figure 3 panel E shows large variations in the MSND at all time points, curves do not start at the same lag time.<br /> (4) Figure S5 shows that for Alu elements, alpha is close to 0.5 at τ =<1 s but lower for larger tau, the relationship to intensity is inverse as well. Please explain.<br /> (5) It would be important to show the D values of your estimations. Plots for MSD curves in non log scale are important to be presented to show if there are different diffusion regimes (such as in Figure 4).<br /> (6) It is mentioned that the "Our measurements of total chromatin dynamics at lag time τ = 5 s are typically on the order of 10-2 μm2 (Figure 3 A, B), in agreement with past studies (Shaban et al., 2020; Zidovska et al., 2013)". This is inaccurate as both cited studies were performed at different time lags 0.2 sec. Change in time lag is supposed to show different diffusion behaviour. For consistency, the comparison should be done at the same time lag and the same number of analyzed video frames.<br /> (7) The study applies the MSND analysis for different time lags starting from 0.5 s to 11 s for videos of 60 s. Change in the number of data points affects the accuracy to calculate the diffusion coefficient. What is the impact of this uncertainty on the results and conclusions?

      - Inhibition of polymerase 2 activity increases mobility as was shown before.<br /> (1) Figure 4: change in motion following alpha amanitin and Flavopiridol treatments recapitulate results from the Maeshima group (Nagashima 2019). Data shown for actinomycin treated cells appear extreme. A huge drop in H2B MSND (panel B and D). Please ensure that the cells are still alive after 4-6h exposure to ActD. ActD also affects cytoskeleton and replication, so different conclusion may be drawn if cells are still alive.<br /> (2) Treatment effects could also be enhanced should dCas9/ sgAlu induce massive DNA damage (see above). Check H2B-GFP motion in cells (both treated and not) not labeled with sgAlu.

      - Positioning with respect to the literature:<br /> (1) The introduction, first paragraph is oversimplified, please review the literature citing work performed by many groups in the field using H2B-GFP, telomere or single site labeling in the past 10 years. Give details on the cell type used (mouse or human normal or cancer cells, amplified signals or single genes, same cell or cells at different stages of development, methodologies from whole genome to single particle tracking etc.).<br /> (2) The manuscript claims to introduce a novel mapping of the spatiotemporal dynamics of the A compartment in living cells. However, the authors did not discuss other previous approaches that were developed for the same purpose. The dynamic motion of active transcription chromatin domains/A compartment over the whole nucleus was investigated in different studies that used Mintbody labeling, please check PMCID: PMC7926250, PMCID: PMC8647360, PMID: 27534817, PMCID: PMC8491620<br /> (3) PIV applies a relatively large interrogation window size of micrometers to estimate the displacement vectors. Dynamic changes within the set window can include both A and B compartments, where the contribution of genomic processes to local chromatin motion, typically taking place at the nanometer scale, is missed. The Hi-D method ( PMCID: PMC7168861) introduced an Optical Flow approach to overcome this limitation of PIV (PMCID: PMC6061878 ). Could the authors test if Hi-D method to analyze the movies recorded in this study confirms their conclusions?

      Heterogeneity of chromatin dynamics independent of chromatin density was shown by previous studies such as PMCID: PMC7775763 , and PMCID: PMC7168861 . Could the authors discuss their findings in the context of these studies?

    5. Author response:

      We thank the reviewers for their positive feedback and helpful suggestions for improving our manuscript.

      We appreciate the reviewers highlighting areas where we can improve clarity, particularly in the analysis methodologies and details. We agree that additional control experiments and expansion on single-molecule tracking analysis will provide additional support for our interpretations. 

      We acknowledge the reviewers' suggestion to describe our work's relationship to other studies. While some of our findings are similar to those in past studies, our work introduces a new approach for labeling euchromatin with direct sequence specificity on a genome-wide scale, enabling a deeper understanding of euchromatin organization and dynamics. We will provide more context on the novelty of our work and incorporate a more comprehensive discussion of our work’s relation to other studies in the manuscript.

    1. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use the model organism Drosophila to explore the sex and age impacts of a TBI method. They find age and sex differences: older age is susceptible to mild TBI and females are also more susceptible. In particular, they pursue a finding that virgin vs mated females show different responses: virgins are protected but mated females succumb to TBI with climbing deficits. In fact, virgin females compared to mated females are largely protected. They discover that this is associated with exposure of the females to Sex Peptides in the reproductive neurons of the female reproductive tract. When they extend to RNAseq of brains, they show that there are very few genes in common between males, mated females, virgins and females mated with males lacking Sex Peptide. The few chronic genes associated with mated females seem associated with the immune system. These findings suggest that mated females have a compromised immune system, which might make them more vulnerable.

      Strengths:

      This is an interesting paper that allows a detailed comparison of sex and age in TBI which is largely only possible in such a simple model, where large numbers and many variations can be addressed. Overall the findings are interesting.

      Weaknesses:

      Although the findings beyond Sex Peptide are observational, the work sets the stage for more detailed studies to pursue the role of the genes they find by RNAseq and whether for example, boosting the innate immune system would protect the mated females, among other experiments.

    2. eLife assessment

      In the current study, the authors describe how sex and age affect the consequence of traumatic brain injury in Drosophila. They find that females are more sensitive than males, and mated females are sensitive whereas virgin females are not. This fundamental work substantially advances our understanding of how sex-dependent response to traumatic brain injury occurs, by identifying the Sex Peptide and the immune system as modulators of sex differences. The authors provide a compelling set of results, showing that female Sex Peptide signaling in Drosophila adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use the Drosophila model system to study the impact of mild head trauma on sex-dependent brain deficits. They identify Sex Peptide as a modulator of greater negative outcome in female flies. Additionally, they observe that increased age at the time of injury results in worse outcomes, especially in females, and that this is due to chronic suppression of innate immune defense networks in mated females. The results demonstrate a novel signaling pathway that promotes age- and sex-dependent outcomes after head injury.

      Strengths:

      The authors have modified their previously reported TBI model in flies to mimic mild TBI, which is novel. Methods are explained in detail, allowing for reproducibility. Experiments are rigorous with appropriate statistics. A number of important controls are included. The work tells a complete mechanistic story and adds important data to increase our understanding of sex-dependent differences in recovery after TBI. The discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      A very minor weakness is that exact n values should be included in the figure legends. There should also be confirmation of knockdown by RNAi in female flies either by immunohistochemistry or qRT-PCR if possible.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors used a Drosophila model to show that exposure to repetitive mild TBI causes neurodegenerative conditions that emerge late in life and disproportionately affect females. In addition to well-known age-dependent impact, the authors identified Sex Peptide (SP) signaling as a key factor in female susceptibility to post-injury brain deficits.

      Strengths:

      The authors have presented a compelling set of results showing that female Sex Peptide signaling adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury in Drosophila. They have (1) compared the phenotypes of adult male and female flies sustaining TBI at different ages, and the phenotypes of virgin females and mated females, (2) compared the phenotypes of eliminating SP signaling in mating females and introducing SP-signaling into virgin females, (3) compared transcriptomic changes of different groups in response to TBI. The results are generally consistent and robust.

      Weaknesses:

      The authors have made their claims largely based on assaying climbing index and vacuole formation as the only indicators of late-life neurodegeneration after TBI. However, these phenotypes are not really specific to TBI-related neurodegeneration, and the significance and mechanisms of especially vacuole formation are not clear. The authors should perform additional analyses on TBI-related neurodegeneration in flies, which have been shown before (Genetics. 2015 Oct; 201(2): 377-402). Furthermore, it is also really surprising to see so few DEGs even in wild-type males and mated females, and to see that none of the DEGs overlapped among groups or are even related to the SP-signaling. This raises questions about the validity of the RNA-seq analysis. It is critical to independently verify their RNA-sequencing results and to add some more molecular evidence to support their conclusion. Finally, it is unknown what the implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans, and what are the mechanisms underlying the sexual dimorphism.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use the model organism Drosophila to explore the sex and age impacts of a TBI method. They find age and sex differences: older age is susceptible to mild TBI and females are also more susceptible. In particular, they pursue a finding that virgin vs mated females show different responses: virgins are protected but mated females succumb to TBI with climbing deficits. In fact, virgin females compared to mated females are largely protected. They discover that this is associated with exposure of the females to Sex Peptides in the reproductive neurons of the female reproductive tract. When they extend to RNAseq of brains, they show that there are very few genes in common between males, mated females, virgins and females mated with males lacking Sex Peptide. The few chronic genes associated with mated females seem associated with the immune system. These findings suggest that mated females have a compromised immune system, which might make them more vulnerable.

      Strengths:

      This is an interesting paper that allows a detailed comparison of sex and age in TBI which is largely only possible in such a simple model, where large numbers and many variations can be addressed. Overall the findings are interesting.

      Weaknesses:

      Although the findings beyond Sex Peptide are observational, the work sets the stage for more detailed studies to pursue the role of the genes they find by RNAseq and whether for example, boosting the innate immune system would protect the mated females, among other experiments.

      We thank the reviewer for their time and effort in evaluating our manuscript. We agree that future studies are needed to further determine the role of the genes that we have identified through RNA sequencing in the late life emergence of neurodegenerative conditions after the exposure to mild head trauma. We would like to investigate whether elevating mated female immunity can mitigate the risk for age-dependent neurodegeneration after mild head trauma.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use the Drosophila model system to study the impact of mild head trauma on sex-dependent brain deficits. They identify Sex Peptide as a modulator of greater negative outcome in female flies. Additionally, they observe that increased age at the time of injury results in worse outcomes, especially in females, and that this is due to chronic suppression of innate immune defense networks in mated females. The results demonstrate a novel signaling pathway that promotes age- and sex-dependent outcomes after head injury.

      Strengths:

      The authors have modified their previously reported TBI model in flies to mimic mild TBI, which is novel. Methods are explained in detail, allowing for reproducibility. Experiments are rigorous with appropriate statistics. A number of important controls are included. The work tells a complete mechanistic story and adds important data to increase our understanding of sex-dependent differences in recovery after TBI. The discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      A very minor weakness is that exact n values should be included in the figure legends. There should also be confirmation of knockdown by RNAi in female flies either by immunohistochemistry or qRT-PCR if possible.

      We thank the reviewer for the evaluation of our manuscript and for the suggestion to include the exact n values in the figure legends. We will include the n values in our revision.

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings.  It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We will revise the manuscript to make these points clear. 

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors used a Drosophila model to show that exposure to repetitive mild TBI causes neurodegenerative conditions that emerge late in life and disproportionately affect females. In addition to well-known age-dependent impact, the authors identified Sex Peptide (SP) signaling as a key factor in female susceptibility to post-injury brain deficits.

      Strengths:

      The authors have presented a compelling set of results showing that female Sex Peptide signaling adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury in Drosophila. They have (1) compared the phenotypes of adult male and female flies sustaining TBI at different ages, and the phenotypes of virgin females and mated females, (2) compared the phenotypes of eliminating SP signaling in mating females and introducing SP-signaling into virgin females, (3) compared transcriptomic changes of different groups in response to TBI. The results are generally consistent and robust.

      Weaknesses:

      The authors have made their claims largely based on assaying climbing index and vacuole formation as the only indicators of late-life neurodegeneration after TBI. However, these phenotypes are not really specific to TBI-related neurodegeneration, and the significance and mechanisms of especially vacuole formation are not clear. The authors should perform additional analyses on TBI-related neurodegeneration in flies, which have been shown before (Genetics. 2015 Oct; 201(2): 377-402). Furthermore, it is also really surprising to see so few DEGs even in wild-type males and mated females, and to see that none of the DEGs overlapped among groups or are even related to the SP-signaling. This raises questions about the validity of the RNA-seq analysis. It is critical to independently verify their RNA-sequencing results and to add some more molecular evidence to support their conclusion. Finally, it is unknown what the implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans, and what are the mechanisms underlying the sexual dimorphism.

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains.

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and essentially no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We will incorporate these important points in the revision. 

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the late-life onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR.

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SP-like molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript, but will revise it to further clarify the discussion.

    1. eLife assessment

      The authors show in vitro that TAK1 overexpression reduces tumor cell migration and invasion, while TAK1 knockdown promotes a mesenchymal phenotype and enhances migration and invasion. The work is a valuable addition to the field of tumor biology of esophageal squamous cell carcinoma. Although minor limitations exist, the overall evidence is solid. The data aligns with previous findings by the same researchers and others.

    2. Reviewer #1 (Public Review):

      Summary:

      In previously published work, the authors found that Transforming Growth Factor β Activated Kinase 1 (TAK1) may regulate esophageal squamous cell carcinoma (ESCC) tumor cell proliferation via the RAS/MEK/ERK axis. They explore the mechanisms for TAK1 as a possible tumor suppressor, demonstrating phospholipase C epsilon 1 as an effector of tumor cell migration, invasion and metastatic potential.

      Strengths:

      The authors show in vitro that TAK1 overexpression reduces tumor cell migration and invasion while TAK1 knockdown promotes a mesenchymal phenotype (epithelial-mesenchymal transition) and enhances migration and invasion. To explore possible mechanisms of action, the authors focused on phospholipase C epsilon 1 (PLCE1) as a potential effector, having identified this protein in co-immunoprecipitation experiments. Further, they demonstrate that TAK1-mediated phosphorylation of PLCE1 is inhibitory. Each of the observations is supported by different experimental strategies, e.g. use of different approaches for knockdown (pharmacologic, RNA inhibition, CRISPR/Cas). Xenograft experiments showed that suppression/loss of TAK1 is associated with more frequent metastases and conversely that PLCE1 is associated positively with xenograft metastases. A considerable amount of experimental data is presented for review, including supplemental data, that show that TAK1 regulation may be important in ESCC development.

      Weaknesses:

      As noted by the authors, immunoprecipitation (IP) experiments identified a number (24) of proteins as potential targets for the TAK1 ser/thr kinase. Prior work (cited as Shi et al, 2021) focused on a different phosphorylation target for TAK1, Ras association domain family 9 (RASSF9), but a more comprehensive discussion of the co-IP experiments would help place this work in a better context.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Ju Q et al performed both in vitro and in vivo experiments to test the effect of TAK1 on cancer metastasis. They demonstrated that TAK1 is capable of directly phosphorylating PLCE1 and this modification represses its enzyme activity, leading to suppression of PIP2 hydrolysis and subsequently signal transduction in the PKC/GSK-3β/β-Catenin axis.

      Strengths:

      The quality of data is good, and the presentation is well organized in a logical way.

      Weaknesses:

      The study missed some key link in connecting the effect of TAK1 on cancer metastasis via phosphorylating PLCE1.

    4. Reviewer #3 (Public Review):

      Summary:

      The research by Qianqian Ju et al. found that the knockdown of TAK1 promoted ESCC migration and invasion, whereas overexpression of TAK1 resulted in the opposite outcome. These in vitro findings could be recapitulated in a xenograft metastasis mouse model.

      Mechanistically, TAK1 phosphorylates PLCE1 S1060 in the cells, decreasing PLCE1 enzyme activity and repressing PIP2 hydrolysis. As a result, reducing DAG and inositol IP3, thereby suppressing signal transduction of PKC/GSK 3β/β Catenin. Consequently, cancer metastasis-related genes were impeded by TAK1.

      Overall, this study offers some intriguing observations. Providing a potential druggable target for developing agents for dealing with ESCC.

      The strengths of this research are:

      (1) The research always uses different experimental approaches to address one question. The experiments are largely convincing and appear to be well executed.<br /> (2) The phenotypes were observed from different angles: at the mouse model, cellular level, and molecular level.<br /> (3) The molecular mechanism was down to a single amino acid modification on PLCE1.

      The weaknesses part of this research are:

      (1) Most of the phenotypes are only observed in the ECA-109 cell line. Whether TAK1-PLCE1 S1060 is a common pathway in other ESCC cells or just specific to the ECA-109 cell line is unclear. Using more cell lines to see whether this is a common mechanism of ESCC metastasis would greatly amplify the impact of this finding.<br /> (2) Most of the experiments were done in protein overexpression conditions, with the protein level increasing hundreds of folds in the cell, producing an artificial environment that would sometimes generate false positive results.<br /> (3) Whether TAK1 can directly phosphorylate PLCE1 S1060 needs more tests, especially the in vitro biochemical evidence.

    1. eLife assessment

      This manuscript describes an AI-automated microscopy-based approach to characterize both bacterial and host cell responses associated with Shigella infection of epithelial cells. The methodology is compelling and should be helpful for investigators studying a variety of intracellular pathogens. The authors have acquired valuable findings regarding host and bacterial responses in the context of infection, which should be followed up with further mechanistic-based studies.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

    3. Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria.

      The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on the technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and we fully agree that this reference should be more fully discussed in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and the excitement for our results, including our findings on T3SS activity and Shigella-septin interactions_._ In accordance with the Reviewer’s comments, we agree to carefully re-edit our manuscript to avoid overselling our data in a future version of the manuscript. We will also consider to rearrange figures depending on new results.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescent labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

    1. eLife assessment

      This valuable work characterized a new set of small molecules targeting the interaction between ELF3-MED23, with one of the reported compounds representing a promising novel therapeutic strategy, The evidence supporting the conclusions is solid, although including characterization with breast and lung cancer cell models would strengthen the study. This article will be of interest to medical and cell biologists working on cancer and, particularly, on HER2-overexpression cancers.

    2. Reviewer #1 (Public Review):

      Summary:

      Soo-Yeon Hwang et al. synthesized and characterized a new set of small molecules targeting the interaction between ELF3-MED23, the transcription factor, and a coactivator for HER2 transcription, respectively. The authors used a combination of biochemical analysis, cell-based assays, and an in vivo xenograft model to prove that the lead compound 10 inhibits the HER2 transcription and protein expression levels, subsequently inducing anticancer activity in the gastric cancer cell line, the xenograft model, particularly in the trastuzumab-resistant cell line. The experiential data is solid and supports the model for the anticancer potency of the compound for the HER2+ gastric cancer model. Although the compound showed promising data for its potential antitumor activity for HER2+ cancers, it is a little bit narrow to the HER2+ cancer field since the most relevant HER2+ cancer model is HER2+ breast cancer and the Herceptin-resistance, indeed the author also discussed this point in the manuscript. Therefore, additional data with the breast cancer HER2+ cell model will help to impact the work in the field.

      Strengths:

      The current manuscript proposed a potential alternative strategy targeting HER2 overexpression cancers by attenuating HER2 transcription levels. The study provides solid evidence that the lead compound 10 can interrupt the binding of ELF3 to MED23, leading to the inhibition of HER2 transcription. Remarkably, the following cell-based assays and xenograft model revealed the promising antitumor activity of the compound in the gastric cancer model.

      Weaknesses:

      While the novel compound showed a promising potency to the HER2-positive gastric cancer cells and xenograft model, it would be great to also to be evaluated with the HER2-positive breast cancer cell models. The author did not compare the current compounds with other therapeutic strategies targeting HER2 expression at the genetic level. It is unclear whether the EGFR inhibitors gefitinib and canertinib but not HER2-specific inhibitors (i.e. tucatinib) were used as a control in the manuscript.

    3. Reviewer #2 (Public Review):

      Summary:

      The findings highlight the importance of targeting the ELF3-MED23 protein-protein interaction (PPI) as a potential therapeutic strategy for HER2-overexpressing cancers, notably gastric cancers, as an alternative to trastuzumab. The evidence, including the strong potency of compound 10 in inhibiting ELF3-MED23 PPI, its capacity to lower HER2 levels, induce apoptosis, and impede proliferation both in laboratory settings and animal models, indicates that compound 10 holds promise as a novel therapeutic option, even for cases resistant to trastuzumab treatment.

      Strengths:

      The experiments conducted are robust and diverse enough to address the hypothesis posed.

      Weaknesses:

      The rationale behind the proposed structural modifications for the three groups of compounds is not clear.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors synthesized a compound which can inhibit ELF3 and MED23 interaction which leads to inhibition of HER2 expression in gastric cancer.

      Strengths:

      Enough evidence shows the potency of compound 10 in inhibiting ELF3 and MED23 interaction.

      Weaknesses:

      Compound 10 potency as PPI inhibitor has been shown in only one cell line NCI-N87.

    1. eLife assessment

      This manuscript describes a method for genetic manipulation of Leishmania species which should be sufficiently efficient to enable genome-wide genetic screens. The authors improved numerous aspects of their previously described method, which is based on sequence-specific genome editing to introduce premature stop codons using a CAS9-cytidine deaminase variant. The work is thoroughly described, with convincing data, and will be very important for Leishmania researchers, as well as perhaps suggesting the use of similar approaches in other organisms in which genetic manipulation is challenging.

    2. Reviewer #1 (Public Review):

      While CRISPR/Cas technology has greatly facilitated the ability to perform precise genome edits in Leishmania spp., the lack of a non-homologous DNA end-joining (NHEJ) pathway in Leishmania has prevented researchers from performing large-scale Cas-based perturbation screens. With the introduction of base editing technology to the Leishmania field, the Beneke lab has begun to address this challenge (Engstler and Beneke, 2023).

      In this study, the authors build on their previously published protocols and develop a strategy that:

      (1) allows for very high editing efficiency. The cell editing frequency of 1 edit per 70 cells reported in this study represents a 400-fold improvement over the previously published protocol,<br /> (2) reduces the negative effects of high sgRNA levels on parasite growth by using a weaker T7 promoter to drive sgRNA transcription.

      The combination of these two improvements should open the door to exciting large-scale screens and thus be of great interest to researchers working with Leishmania and beyond.

    3. Reviewer #2 (Public Review):

      Summary:

      Previously, the authors published a Leishmania cytosine base editor (CBE) genetic tool that enables the generation of functionally null mutants. This works by utilising a CAS9-cytidine deaminase variant that is targeted to a genetic locus by a small guide RNA (sgRNA) and causes cytosine to thymine conversion. This has the potential to generate a premature stop codon and therefore a loss of function mutant.

      CBE has advantages over existing CAS-based knockout tools because it allows the targeting of multicopy gene families and, potentially, the easier generation of pooled loss of function mutants in complex population experiments. Although successful, the first generation of this genetic tool had several limitations that may have prevented its wider adoption, especially in complex genome-wide screens. These include nonspecific toxicity of the sgRNAs, low transfection efficiencies, low editing efficiencies, a proportion of transfectants that express multiple different sgRNAs, and insufficient effectivity in some Leishmania species.

      Here, the authors set out to systematically solve each of these limitations. By trialling different transfection conditions and different CAS12a cut sites to promote sgRNA expression cassette integration, they increase the transfection efficiency 400-fold and ensure that only a single sgRNA expression cassette integrates that edits with high efficiencies. By trialling different T7 promoters, they significantly reduce the non-specific toxicity of sgRNA expression whilst retaining high editing efficiencies in several Leishmania species (Leishmania major, L. mexicana and L. donovani). By improving the sgRNA design, the authors predict that null mutants will be more efficiently produced after editing.

      This tool will find adoption for producing null mutants of single-copy genes, multicopy gene families, and potentially genome-wide mutational analyses.

      Strengths:

      This is an impressive and thorough study that significantly improves the previous iteration of the CBE. The approach is careful and systematic and reflects the authors' excellent experience developing CRISPR tools. The quality of data and analysis is high and data are clearly presented.

      Weaknesses:

      Figure 4 shows that editing of PF16 is 'reversed' between day 6 and day 16 in L. mexicana WTpTB107 cells. The authors reasonably conclude that in drug-selected cells there is a mixed population of edited and non-edited cells, possibly due to mis-integration of the sgRNA expression construct, and non-edited cells outcompete edited cells due to a growth defect in PF16 loss of function mutants. However, this suggests that the CBE tool will not work well for producing mutants with strong fitness phenotypes without incorporating a limiting dilution cloning step (at least in L. mexicana and quite possibly other Leishmania species). Furthermore, it suggests it will not be possible to incorporate genes associated with a growth defect into a pooled drop-out screen as described in the paper. This issue is not well explored in the paper and the authors have not validated their tool on a gene associated with a severe growth defect, or shown that their tool works in a mixed population setting.

      Although welcome, the improvements to the crRNA CBE design tool are hypothetical and untested.

      The Sanger and Oxford Nanopore Technology analyses on integration sites of the sgRNA expression cassette integration will not detect the mis-integration of the sgRNA expression construct into an entirely different locus.

    4. Reviewer #3 (Public Review):

      Genetic manipulation of Leishmania has some challenges, including some limitations in the DNA repair strategies that are present in the organism and the absence of RNA interference in many species. The senior author has contributed significantly to expanding the available routes towards Leishmania genetic manipulation by developing and adapting CRISPR-Cas9 tools to allow gene manipulation via DNA double-strand break repair and, more recently, base modification. This work seeks to improve on some limitations in the tools previously described for the latter approach of base modification leading to base change.

      The work in the paper is meticulously described, with solid evidence for most of the improvements that are claimed: Figure1 clearly describes reduced impairment in the growth of parasites expressing sgRNAs via changes in promoters; Figures 2 and 3 compellingly document the usefulness of using AsCas12a for integration after transformation; and Figures 1 and 4 demonstrate the capacity of the combined modifications to efficiently edit a gene in three different Leishmania species. There is little doubt these new tools will be adopted by the Leishmania community, adding to the growing arsenal of approaches for genetic manipulation.

      There are two weaknesses the authors may wish to address, one smaller and one larger.

      (1) The main advance claimed here is in this section title: 'Integration of CBE sgRNA expression cassettes via AsCas12a ultra-introduced DSBs increase editing rates', with the evidence for this presented in Figure 4. It is hard work in the submission to discern what direct evidence there is for editing rates being improved relative to earlier, Cas9-based approaches. Did they directly compare the editing by the new and old approach? If not, can they more clearly explain how they are able to make this claim, either by adding text or a new figure? A side-by-side comparison would emphasise the advance of the new approach more clearly.

      (2) The ultimate, stated goal of this work is (abstract) to 'enable a variety of loss-of-function screens', as the older approach had some limitations. This goal is not tested for the new tools that have been developed here; the experiment in Figure 5 merely shows that they can, not unexpectedly, make a gene mutant, which was already possible with available tools. Thus, to what extent is this paper describing a step forward? Why have the authors not run an experiment - even the same one that was described previously in Engstler and Beneke (2023) - to show that the new approach improves on previous tools in such a screen, either in scale or accuracy?

    5. Author response:

      We would like to thank all reviewers and editors for their thorough peer review and valuable suggestions. In these provisional responses, we summarize the main concerns raised by the reviewers and outline our planned revisions to address them in the manuscript.

      Overall, we are pleased to note that the reviewers agree on the potential value of our updated toolbox for gene editing, highlighting its various applications. However, they also raised several valid concerns, which we have summarized and responded to as follows:

      (1) Mutant phenotypes in transfected populations can be occasionally reversed or escaped. This suggests it will not be possible to detect growth-associated phenotypes in pooled screens. An experiment with a pooled loss-of-function screen to test this is missing.

      Escapes or reversals of mutant phenotypes have been observed with other genetic tools used for loss-of-function screening, including lentiviral CRISPR approaches in mammalian systems and RNAi in Trypanosoma brucei. Cells can escape phenotypes through various mechanisms, such as promoter silencing or selection of non-deleterious mutations. Additionally, not every CRISPR guide is efficient in generating a mutant phenotype, and RNAi constructs can also vary in their effectiveness. Despite these challenges, genome-wide loss-of-function screens have been successfully carried out in mammalian cells and Trypanosoma parasites. Therefore, we believe that the observed escape of one mutant phenotype does not preclude the detection of growth-associated or other phenotypes in pooled screens. Moreover, we did not observe a reversal of the mutant phenotype in L. mexicana, L. donovani, and L. major parasites expressing tdTomato from an expression cassette integrated into the 18S rRNA SSU locus (Figure 4). However, the reviewers are rightfully requesting a pooled loss-of-function screen to validate this. Since submitting this manuscript, we have conducted multiple pooled loss-of-function screens, which have confirmed the ability of our here presented method to detect a range of mutant phenotypes in pooled screening formats. We will include these results in our revised manuscript.

      (2) The possibility of mis-integration of the CBE sgRNA expression construct into an entirely different locus is not explored.

      We plan to reanalyze our ONT sequencing data to verify if the CBE sgRNA expression construct was integrated into an unintended loci. If we detect any mis-integration events, we will evaluate their potential negative impacts and discuss these findings in the revised manuscript.

      (3) The achieved increase in editing efficiency compared to the previous base editing method could be more clearly presented.

      We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. In the revised manuscript, we will present and describe the increase in editing rate more clearly.

      (4) The improvements on CBE sgRNA guide design are hypothetical and untested.

      We agree that the improvements to the CBE sgRNA design are currently hypothetical. We plan to systematically test our guide design principles in future studies. Since this will require testing hundreds of guides to draw robust conclusions, we believe that this aspect is beyond the scope of the current study. However, we will discuss our plans for future validation in the revised manuscript.

      Overall, we appreciate the reviewers' insights and are committed to addressing their concerns thoroughly. We believe that the planned revisions and additional experiments will significantly strengthen our manuscript and provide a more comprehensive evaluation of our updated gene editing toolbox.

    1. Reviewer #1 (Public Review):

      Summary:

      Moir, Merheb et al. present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Weaknesses:

      The study could have explored identifying the extent of changes in the tRNA transcriptome among different cell types in the cerebrum. Although the authors attempted to show the temporal progression of tRNA transcriptome changes between P42 and P75 mice, the causal link was not established. A subsequent rescue experiment in the future could address this gap.

      Nonetheless, the claims and conclusions are supported by the presented data.

    2. Reviewer #2 (Public Review):

      Summary:

      The study "Molecular basis of neurodegeneration in a mouse model of Polr3 related disease" by Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs. Furthermore, their study suggested that RNA pol III mutation leads to behavioural deficits that are commonly observed in neurodegeneration. Although, this study used a mouse model to establish theses aspects, the study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour. They should have used conditional mouse to delete the gene in specific brain area to test their hypothesis. Otherwise, this study shows a more generalized developmental effect rather than specific function of altered tRNA level. This is very evident from their bulk RNA sequencing study. This study provides some discrete information rather than a coherent story.

      Strengths:

      The study created a mouse model to investigate role of RNA PolIII transcription. Furthermore, the study provided RNA seq analysis of the mutant mice and highlighted expression specific transcripts affected by the RNA PolIII mutation.

      Weaknesses:

      1) The abstract is not clearly written. It is hard to interpret what is the objective of the study and why they are important to investigate. For example: "The molecular basis of disease pathogenesis is unknown." Which disease? 4H leukodystrophy? All neurodegenerative disease?

      2) How cerebral pathology and exocrine pancreatic atrophy are related? How altered tRNA level connects these two axes?

      3) Authors mentioned that previously observed reduction mature tRNA level also recapitulated in their study. Why this study is novel then?

      4) It is very intuitive that deficit in Pol III transcription would severely affect protein synthesis in all brain areas as well as other organs. Hence, growth defect observed in Polr3a mutant mice is not very specific rather a general phenomenon.

      5) Authors observed specific myelination defect in cortex and hippocampus but not in cerebellum. This is an interesting observation. It is important to find the link between tRNA removal and myelin depletion in hippocampus or cortex? Why is myelination not affected in cerebellum?

      6) How was the locomotor activity measured? The detailed description is missing. Also, locomotion is primarily cerebellum dependent. There is no change in term of growth rate and myelination in cerebellar neurons. I do not understand why locomotor activity was measured.

      7) The correlation with behavioural changes and RNA seq data is missing. There a number of transcripts are affected and mostly very general factors for cellular metabolism. Most of them are RNA Pol II transcribed. How a Pol III mutation influences RNA Pol II driven transcription? I did not find differential expression of any specific transcripts associated with behavioural changes. What is the motivation for transcriptomics analysis? None of these transcripts are very specific for myelination. It is rather a general cellular metabolism effect that indirectly influences myelination.

      8) What genes identified by transcriptomics analysis regulates maturation of tRNA? Authors should at least perform RNAi study to identify possible factor and analyze their importance in maturation of tRNA.

      9) What factors are influencing tRNA transport to cytoplasm? It may be possible that Polr3a mutation affect cytoplasmic transport of tRNA. Authors should study this aspect using an imaging experiment.

      10) Does alteration of cytoplasmic level of tRNA affects translation? Author should perform translation assay using bio-orthoganal amino acid (AHA) labelling.

    1. eLife assessment

      This important chronobiological study in mice suggests that light modulated activity of Cdk5 activity on the PKA-CaMK-CREB signaling pathway provides missing molecular mechanistic details to understand light- induced circadian clock phase delays during the early night, but not for phase advances in the morning. The authors provide overall convincing evidence bridging from behavioral to molecular/cellular experiments to neural activity imaging.

    2. Reviewer #1 (Public Review):

      In the manuscript "Cyclin-dependent kinase 5 (Cdk5) activity is modulated by light and gates rapid phase shifts of the circadian clock", Brenna et al study the role of Cdk5 on circadian rhythms and they conclude that the CDK5 gates the activity of light on phase shifts at ZT by showing that the behavioural shifts to light as a result of CDK5 silencing only affect light-induced phase shifts at ZT/CT 14 but not at other times.

      Further, they delineate the mechanism behind this phenotype and demonstrate that 1) CDK5 activity is downregulated following a light pulse via a loss of interaction with p35 and demonstrate this via an activity assay. 2) knock-down of CDK5: increases CREB, CAMK-ii/iv phosphorylation, likely via increasing calcium levels along with alterations to the localisation of Cav3.1, 3) reduces: light-induced response in vivo at ZT14 in the SCN.

      They suggest this mechanism involves light 'silencing' CDK5-pathway (possibly by disrupting P35 interaction and dysregulating this pathway) which under basal conditions phosphorylates DARP32 leading to PKA inhibition and by extension reduction in activation of the calcium-calmodulin kinase activity and leading to reduced CREB activity. The authors finally evaluate gene expression changes of previously described light-responsive-genes in at ZT14 and the SCN.

      This is an interesting piece of work that explains how circadian responses to light could be gated and is generally well supported by a wealth of data. Whilst I found the overall involvement of CDK5 in gating light response interesting and convincing, I have some concerns about their interpretation of the data surrounding the mechanism, which I have detailed below. I also think this manuscript could be improved with a slightly different structure and concise discussion for the benefit of a broader scientific audience.

    3. Reviewer #2 (Public Review):

      Summary:

      Definition of the role of CdK5 in circadian locator activity and light induced neural activity in the mouse SCN in-vivo revealing its mode of action through PKA-CaMK-CREB signaling pathway.

      Strengths:

      The experimental approaches are carried from in-vivo, to cellular and molecular level and provide first evidence for the specific involvement of CdK5 in light-induced phase advance of the free-running rhythm.

      Weaknesses:

      The behavioral analyses are limited to some selected parameters.<br /> Downstream effects on circadian oscillation of gene expression and physiological functions in other brain regions, and organs is missing.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Weaknesses:

      The comparison of affinity predictions derived from AlphaFold2 and H3-opt models, based on molecular dynamics simulations, should have been discussed in depth. In some cases, there are huge differences between the estimations from H3-opt models and those from experimental structures. It seems that the authors obtained average differences of the real delta, instead of average differences of the absolute value of the delta. This can be misleading, because high negative differences might be compensated by high positive differences when computing the mean value. Moreover, it would have been good for the authors to disclose the trajectories from the MD simulations.

      Thanks for your careful checks. We fully understand your concerns about the large differences when calculating affinity. To understand the source of these huge differences, we carefully analyzed the trajectories of the input structures during MD simulations. We found that the antigen-antibody complex shifted as it transited from NVT to NPT during pre-equilibrium, even when restraints are used to determine the protein structure. To address this issue, we consulted the solution provided on Amber's mailing list (http://archive.ambermd.org/202102/0298.html) and modified the top file ATOMS_MOLECULE item of the simulation system to merge the antigen-antibody complexes into one molecule. As a result, the number of SOLVENT_POINTERS was also adjusted. Finally, we performed all MD simulations and calculated affinities of all complexes.

      We have corrected the “Afterwards, a 25000-step NVT simulation with a time step of 1 fs was performed to gradually heat the system from 0 K to 100 K. A 250000-step NPT simulation with a time step of 2 fs was carried out to further heat the system from 100 K to 298 K.” into “Afterwards, a 400-ps NVT simulation with a time step of 2 fs was performed to gradually heat the system from 0 K to 298 K (0–100 K: 100 ps; 100-298 K: 200 ps; hold 298 K: 100 ps), and a 100-ps NPT simulation with a time step of 2 fs was performed to equilibrate the density of the system. During heating and density equilibration, we constrained the antigen-antibody structure with a restraint value of 10 kcal×mol-1×Å-2.” and added the following sentence in the Method section of our revised manuscript: “The first 50 ns restrains the non-hydrogen atoms of the antigen-antibody complex, and the last 50 ns restrains the non-hydrogen atoms of the antigen, with a constraint value of 10 kcal×mol-1×Å-2”

      In addition, we have corrected the calculation of mean deltas using absolute values and have demonstrated that the average affinities of structures predicted by H3-OPT were closer to those of experimentally determined structures than values obtained through AF2. These results have been updated in the revised manuscript. However, significant differences still exist between the estimations of H3-OPT models and those derived from experimental structures in few cases. We found that antibodies moved away from antigens both in AF2 and H3-OPT predicted complexes during simulations, resulting in RMSDbackbone (RMSD of antibody backbone) exceeding 20 Å. These deviations led to significant structural changes in the complexes and consequently resulted in notable differences in affinity calculations. Thus, we removed three samples (PDBID: 4qhu, 6flc, 6plk) from benchmark because these predicted structures moved away from the antigen structure during MD simulations, resulting in huge energy differences from the native structures.

      Author response table 1.

      We also appreciate your reminder, and we have calculated all RMSDbackbone during production runs (SI Fig. 5).

      Author response image 1.

      Reviewer #3 (Public Review):

      Weaknesses:

      The proposed method lacks of a confidence score or a warning to help guiding the users in moderate to challenging cases.

      We were sorry for our mistakes. We have updated our GitHub code and added following sentences to clarify how we train this confidence score module in Method Section: “Confidence score prediction module

      We apply an MSE loss for confidence prediction, label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100. The dropout rates of H3-OPT were set to 0.25. The learning rate and weight decay of Adam optimizer are set to 1 × 10−5 and 1 × 10−4, respectively.”

      Reviewer #2 (Recommendations For The Authors):

      I would strongly suggest that the authors deepen their discussion on the affinity prediction based on Molecular Dynamics. In particular, why do the authors think that some structures exhibit huge differences between the predictions from the experimental structure and the predicted by H3-opt? Also, please compute the mean deltas using the absolute value and not the real value; the letter can be extremely misleading and hidden very high differences in different directions that are compensating when averaging.

      I would also advice to include graphical results of the MD trajectories, at least as Supp. Material.

      We gratefully thank you for your feedback and fully understand your concerns. We found the source of these huge differences and solved this problem by changing method of MD simulations. Then, we calculated all affinities and corrected the mean deltas calculation using the absolute value. The RMSDbackbone values were also measured to enable accurate affinity predictions during production runs (SI Fig. 5). There are still big differences between the estimations of H3-OPT models and those from experimental structures in some cases. We found that antibodies moved away from antigens both in AF2 and H3-OPT predicted complexes during simulations, resulting in RMSDbackbone exceeding 20 Å. These deviations led to significant structural changes in the complexes and consequently resulted in notable differences in affinity calculations. Thus, we removed three samples (PDBID: 4qhu, 6flc, 6plk) from benchmark.

      Thanks again for your professional advice.

      Reviewer #3 (Recommendations For The Authors):

      (1) I am pleased with the most of the answers provided by the authors to the first review. In my humble opinion, the new manuscript has greatly improved. However, I think some answers to the reviewers are worth to be included in the main text or supporting information for the benefit of general readers. In particular, the requested statistics (i.e. p-values for Cα-RMSD values across the modeling approaches, p-values and error bars in Fig 5a and 5b, etc.) should be introduced in the manuscript.

      We sincerely appreciate your advice. We have added the statistics values to Fig. 4 and Fig. 5 to our manuscript.

      Author response image 2.

      Author response image 3.

      (2) Similarly, authors state in the answers that "we have trained a separate module to predict the confidence score of the optimized CDR-H3 loops". That sounds a great improvement to H3-OPT! However, I couldn't find any reference of that new module in the reviewed version of the manuscript, nor in the available GitHub code. That is the reason for me to hold the weakness "The proposed method lacks of a confidence score".

      We were really sorry for our careless mistakes. Thank you for your reminding. We have updated our GitHub code and added following sentences to clarify how we train this confidence score module in Method Section:

      “Confidence score prediction module

      We apply an MSE loss for confidence prediction, label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100. The dropout rates of H3-OPT were set to 0.25. The learning rate and weight decay of Adam optimizer are set to 1 × 10−5 and 1 × 10−4, respectively.”

      (3) I acknowledge all the efforts made for solving new mutant/designed nanobody structures. Judging from the solved structures, mutants Y95F and Q118N seems critical to either crystallographic or dimerization contacts stabilizing the CDR-H3 loop, hence preventing the formation of crystals. Clearly, solving a molecular structure is a challenge, hence including the following comment in the manuscript is relevant for readers to correctly asset the magnitude of the validation: "The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template. The CDR-H3 lengths of these nanobodies are both 17. According to our classification strategy, these nanobodies belong to Sub1. The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM."

      We appreciate your kind recommendations and have revised “Although Mut1 (E45A) and Mut2 (Q14N) shared the same CDR-H3 sequences as WT, only minor variations were observed in the CDR-H3. H3-OPT generated accurate predictions with Cα-RMSDs of 1.510 Å, 1.541 Å and 1.411 Å for the WT, Mut1, and Mut2, respectively.” into “Although Mut1 (E45A) and Mut2 (Q14N) shared the same CDR-H3 sequences as WT (LengthCDR-H3 = 17), only minor variations were observed in the CDR-H3. H3-OPT generated accurate predictions with Cα-RMSDs of 1.510 Å, 1.541 Å and 1.411 Å for the WT, Mut1, and Mut2, respectively (The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM). ”. In addition, we have added following sentence in the legend of Figure 4 to ensure that readers can appropriately evaluate the significance and reliability of our validations: “The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template.”.

      (4) As pointed out in the first review, I think the work https://doi.org/10.1021/acs.jctc.1c00341 is worth acknowledging in section "2.2 Molecular dynamics (MD) simulations could not provide accurate CDR-H3 loop conformations" of supplementary material, as it constitutes a clear reference (and probably one of the few) to the MD simulations that authors pretend to perform. Similarly, the work https://doi.org/10.3390/molecules28103991 introduces a former benchmark on AI algorithms for predicting antibody and nanobody structures that readers may find interest to contrast with the present work. Indeed, this later reference is used by authors to answer a reviewer comment.

      Thanks a lot for your valuable comments. We have added these references in the proper positions in our manuscript.

    2. eLife assessment

      This paper presents H3-OPT, a deep learning method that effectively combines existing techniques for the prediction of antibody structure. This work, supported by convincing experiments for validation, is important because the method can aid in the design of antibodies, which are key tools in many research and industrial applications.

    3. Reviewer #2 (Public Review):

      This work provides a new tool (H3-Opt) for the prediction of antibody and nanobody structures, based on the combination of AlphaFold2 and a pre-trained protein language model, with a focus on predicting the challenging CDR-H3 loops with enhanced accuracy than previously developed approaches. This task is of high value for the development of new therapeutic antibodies. The paper provides an external validation consisting of 131 sequences, with further analysis of the results by segregating the test sets in three subsets of varying difficulty and comparison with other available methods. Furthermore, the approach was validated by comparing three experimentally solved 3D structures of anti-VEGF nanobodies with the H3-Opt predictions

      Strengths:

      The experimental design to train and validate the new approach has been clearly described, including the dataset compilation and its representative sampling into training, validation and test sets, and structure preparation. The results of the in silico validation are quite convincing and support the authors' conclusions.

      The datasets used to train and validate the tool and the code are made available by the authors, which ensures transparency and reproducibility, and allows future benchmarking exercises with incoming new tools.

      Compared to AlphaFold2, the authors' optimization seems to produce better results for the most challenging subsets of the test set.

      Weaknesses:

      None

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces a new computational framework for choosing 'the best method' according to the case for getting the best possible structural prediction for the CDR-H3 loop. The authors show their strategy improves on average the accuracy of the predictions on datasets of increasing difficulty in comparison to several state-of-the-art methods. They also show the benefits of improving the structural predictions of the CDR-H3 in the evaluation of different properties that may be relevant for drug discovery and therapeutic design.

      Strengths:

      The authors introduce a novel framework, which can be easily adapted and improved. Authors use a well-defined dataset to test their new method. A modest average accuracy gain is obtained in comparison to other state-of-the art methods for the same task, while avoiding testing different prediction approaches. Although the accuracy gain is mainly ascribed to easy cases, the accuracy and precision for moderate to challenging cases is comparable to the best PLM methods (see Fig. 4b and Extended Data Fig. 2), reflecting the present methodological limit in the field. The proposed method includes a confidence score for guiding users about the accuracy of the predictions.

    1. eLife assessment

      This important study, using three bioactive compounds as a model, demonstrates that estimating the intake of food components based on food composition databases and self-reported dietary data is highly unreliable. The authors present convincing data showing the differences in the estimated quantile of intake of three bioactive compounds between biomarker and 24-hour dietary recall with food-composition database. The work will be of broad interest to the clinical nutrition research community.

    2. Joint Public Review:

      Identifying dietary biomarkers, in particular, has become a main focus of nutrition research in the drive to develop personalized nutrition.

      The aim of this study was to determine the accuracy of using food composition databases to assess the association between dietary intake and health outcomes. The authors found that using food composition data to assess dietary intake of specific bioactives and the impact consumption has on systolic blood pressure provided vastly different outcomes depending on the method used. These findings demonstrate the difficulty in elucidating the relationship between diet and health outcomes and the need for more stringent research in the development of dietary biomarkers.

      The primary strength of the study is the use of a large cohort in which dietary data and the measurement of three specific bioactives and blood pressure were collected on the same day. The bioactives selected have been extensively researched for their health effects. Another strength is that the authors controlled for as many variables as possible when running the simulations to get a more accurate account of how the variability in food composition can impact research findings that associate the intake of certain food components with health outcomes.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for their encouraging comments. Reviewer 1 raises an important question regarding the translation of biomarker derived data into dietary recommendations, taking the high variability in food composition into consideration. Unfortunately, there is no straightforward answer as the high variability in food composition means that the number of cups of tea for 200mg of flavan-3-ols will depend on the flavanol content of the tea. A probabilistic modelling approach, as we have used to investigate the impact of food content variability on estimated associations with health outcomes, would be a possible solution. This could provide food based recommendations that would meet a defined intake with a certain probability. However, developing and exploring such models is beyond the scope of this manuscript and we have therefore decided not to include this in our response. We have stated in the manuscript that such a method needs to be developed.

      We have addressed the typographical errors and the other comments as follows:

      •   Line 126 - this is the first mention of DR-FCT and as such it needs to be defined. This was a typo and it was corrected throughout the manuscript. The actual abbreviation is DD-FCT and it is defined in line 78.

      •   Figure 4 - what exactly is this figure trying to convey to the reader? A better explanation about this figure is needed. Figure legend was updated and extent hoping to increase clarity.

      •   Figure 5 - Why are the graphs presented differently, meaning why are the data for the flavan-3-ols and epicatechin differentiated for men and women and not nitrate. The sample size for nitrate was too small to stratify in the same way as for flavan-3-ols.

      •   Line 365 - more information is needed, I am assuming the authors are stating ”The tableone package for R ...”. As requested by the reviewer, additional details are now included.

      We have also revised the abstract, the conclusion and the discussion of limitations of the biomarker approach to improve readabilty of the manuscript.

    1. eLife assessment

      This useful article provides evidence of the potential neuropathogenicity of Bacillus cereus serovar anthracis in wild chimpanzees. The authors provide an extensive characterization of four chimpanzees that died acutely from anthrax. The study provides incomplete traditional histopathologic evidence of neuroinvasion since the meninges could not be evaluated, which weakens the authors' conclusions. The work will be of interest to infectious disease researchers.

    2. Reviewer #1 (Public Review):

      Summary:

      Gräßle et al. provide a series of four post-mortem cases of chimpanzees with PCR-proven Bacillus cereus biovar anthracis (Bcbva), who reportedly died of this infection. One control case is also provided. Compelling post-mortem Magnetic resonance imaging scans of the highest technical standards are presented. Last, the authors provide some histopathology of the brains aiming at showing the neuroinfective potential of Bcbva.

      Strengths:

      The merits of this study are highly acknowledged. This reviewer deems it very important to implement the latest methodology in such veterinary observational studies, in order to investigate what is going on in wildlife regarding zoonoses. The scans of five whole post-mortem chimpanzee brains with exquisite MRI technology (extremely good scan quality) represent such an implementation.

      Weaknesses:

      The conclusions from the necropsies are, unfortunately, on weak grounds:

      (1) The authors claim that all 4 infected individuals have suffered from meningitis. However, I do not see evidence for that, neither in the gross macroscopical images provided in Figure 1. The authors claim congestion of superficial veins, at least in cases 1-3, and interpret this as pointing towards meningitis. I do not see major superficial vein congestion in any of the cases. Furthermore, vessel congestion here would rather indicate brain swelling and subsequent inhibition of venous blood outflow from the skull, which would relate to brain edema. Bacterial meningitis would itself display as clouding of the meninges, while the meninges presented in all 4 cases are perfectly translucent and gracile.

      (2) The authors show a bacterial overgrowth, of brains, which was most severe in cases 1 and 2, less so in case 3, and least in case 4 (Table 1). This correlates very well with post-mortem intervals (Supplementary Table 1). The amount of bacteria is remarkable, while there is practically no brain inflammation, only moderate microglia activation. Also, the authors do not convincingly prove the proposed meningitis at the histological level, since Figure 6 does not show it in a convincing manner. Also, moderate superficial gliosis shown in Figure 6 g+h is for me not evidence of meningitis. I would expect masses of granulocytes and lymphocytes, given the amount of bacteria shown.

      (3) The pattern of bacterial invasion, i.e. first confined to vessels as in case 4 with short post-mortem interval, and then overgrowing the brain with practically no glial or inflammatory reaction, is very typical of post-mortem putrefication. It is conceivable that the chimpanzees had severe bacteremia, which, after death, quickly led to bacterial invasion into the brain parenchyma. While authors state the post-mortem intervals in hours, they do not state whether bodies were immediately cooled after death.

      (4) I find it difficult to see evidence of superficial siderosis in any of the images. In particular, case 2 in Figure 1 does not convincingly display leptomeningeal hemorrhage. Dark granules, e.g. shown in Figure 4 e, are very typical of so-called formalin pigment. If that would be hemosiderin or some other form of iron, it would be expected that it displays much stronger in the DAB-enhanced perls stain (Figure 4 c).

    3. Reviewer #2 (Public Review):

      In "Neuroinfectiology of an atypical anthrax-causing pathogen in wild chimpanzees" Tobias et.al. provide a detailed histologic characterization of B.cereus biovar anthracis in the brain of four wild chimpanzees in comparison to an uninfected age-matched chimp. The authors present a combination of special stains, radiography (MRI), bacterial culture, and immunohistochemistry including some quantitative image analysis to support the assessment of the neuropathogenicity of Bcbva. However, the study has major limitations that detract from the conclusions presented regarding the neurovirulence of this strain. Namely, there is a near complete lack of traditional histopathological and radiographic interpretation by qualified experts in which to frame the detailed tissue studies. The authors mention that facultative anaerobes are capable of post-mortem replication. Pathologists use comprehensive pathological assessments to determine the extent of disease caused by the primary infection, none of which is mentioned in this study (spleen, heart, lungs), which makes it difficult to determine if the findings in the brain align with the rest of the post-mortem assessment. If these were not included due to severe post-mortem autolysis, it heightens the risk of post-mortem bacterial replication in the CNS. The most important limitation is the fact that the meninges were removed and were not available for assessment therefore any comparisons with existing data on neuropathogenicity of B. anthracis is not possible. An advantage of the study is the inclusion of the control age-matched chimp, but the controls are not shown for many of the IHC and special stains - limiting interpretations. In general, the article is difficult to follow with the figures since many panels are only discussed and interpreted in the figure legends and not the text. In some cases, the results are overly technical with limited clinical insight which makes the article less easy to interpret next to human clinical reports.

    4. Author response:

      We are thankful to the expert reviewers and the editorial team for their assessment of our manuscript and valuable comments, which will help us to improve our manuscript. While Reviewer #1 appreciated the comprehensive assessment using advanced methods, Reviewer #2 asked for an extension of traditional neuropathological and neuroradiological assessments. Both reviewers identified limitations of the study like the inability to provide direct histopathological evidence for meningitis due to missing meninges tissue, resulting in the conclusions being based on indirect evidence. The reviewers raised concerns about potential post mortem penetration of bacteria into the brain parenchyma. Reviewer #1 also questioned the evidence for cortical siderosis based on the intensity of histological stains.

      We agree with both reviewers and the editorial comment that a traditional neuropathological assessment of meningeal status would have strongly boosted the study's conclusions. Please note that the opportunistic sampling approach after a wild animal’s “natural” death, which is the only ethical method to study infection biology in great apes, is intrinsically accompanied by some limitations such as the lack of standardized post mortem intervals or incomplete sampling. In the revised version of the manuscript, we will complement the advanced MRI and histology already presented by extended traditional neuroradiological and neuropathological assessments as recommended by Reviewer #2, including a report on the status of other organs. However, it is important to note that the interpretation of post mortem MRI of brain material collected in the field differs substantially from conventional in vivo MRI and requires tailored analysis and interpretation. Below we comment on three aspects addressed by reviewers:

      * Missing meninges *: The meninges and associated vessels had to be removed to reduce blood-related artifacts in previously performed MRI measurements. We are aware that this poses a major limitation of this study, and thus rely on the evidence derived from the material at hand. Neuropathological assessment is in agreement with the reviewer's comments that no overt acute bacterial meningitis with e.g. turbid appearance, purulent exudates or frank hemorrhages is apparent in the macroscopic inspection of the presented material. However, the macroscopic changes should be evaluated in the light of the brief time interval between bacterial colonization and death. Meningeal bacterial invasion was visualized on a few meningeal residues we found in case 1, proofing the invasion of the subarachnoid space. Based on the reviewer's suggestions, the microscopic neuropathological evaluation will be expanded with the aim to identify further regions with meningeal residues to include more regions to 1) reduce potential sampling bias and 2) to better characterize the leptomeningeal infiltrates focusing on early inflammatory markers.<br />  However, an extensive assessment of the histopathological inflammatory status must be clarified in future studies on specimens with remaining meninges.

      *Putrefaction/Post mortem bacterial proliferation*:<br /> Reviewers raised important points by remarking  that the tissue alterations could be due to putrefaction/post mortem effects. Classical bacterial putrefaction is unlikely, since no mixed flora of opportunistic bacteria was detected, suggesting that time before fixation was sufficient to prevent secondary bacterial invasion in the presented specimens. Moreover, it has been shown that for the post mortem interval of <24 hours bacterial invasion of the brain is rare even at higher temperatures (Ith et al 2011, https://doi.org/10.1002/nbm.1623). The possibility of post mortem tissue propagation of Bcbva must be considered, since there is a lack of experimental data on the pathogen’s growth after host death, which has been discussed by us in the "Limitations" section in the original manuscript. Although it seems plausible that post mortem multiplication in the brain does occur to a certain extent, several observations suggest that this is not the only mechanism at play in the presented cases. We observed early  microglial activation and astrogliosis indicating a beginning inflammatory reaction in the brain parenchyma. Taken together, the data presented suggest a short time interval between bacterial colonization and death. Under this premise, further analyses for the revision of the manuscript will more closely investigate pathological in vivo tissue alterations.

      *Siderosis* Signs of cortical siderosis were evident in the MRI images of all adult cases (1, 3, and 4), appearing as a hyperintense rim in quantitative R2* maps, indicating substantially elevated levels of iron on the brain surface. These findings were confirmed by Perls’s stain for iron. Such rims in R2* are a typical sign of cortical iron deposition due to siderosis, as observed in conditions like angiopathies. Meningeal bleedings are the most probable source of the elevated iron levels in the cortex. Importantly, such signs were never observed in the post mortem brains of chimpanzees not infected with Anthrax (about 30 cases analyzed so far). Reviewer #1 noted that the intensity of the Perls’s stain seemed too low for siderosis. However, this intensity can vary depending on staining procedure and may be lower for the acute and short disease course of Bcbva-induced Anthrax compared to the chronic human cases Reviewer #1 may be referring to. Taken together, we believe that the evidence of cortical siderosis is compelling, speaking in favor of pre mortem meningeal hemorrhage.

      In summary, in the revised version of the manuscript, we plan to: (1) add a traditional neuroradiological assessment of all scans; (2) present an extended traditional neuropathological assessment of all cases; (3) report results on the status of early inflammatory markers; and (4) discuss the limitations of the study in more detail.

    1. eLife assessment

      This valuable biomechanical analysis of kangaroo kinematics and kinetics across a range of hopping speeds and masses is a step towards understanding a long-standing problem in locomotion biomechanics: the mechanism for how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. Based on their suggestion that kangaroo posture changes with speed increase tendon stress/strain and hence elastic energy storage/return, the authors imply (but do not show quantitatively or qualitatively) that the greater tendon elastic energy storage/return counteracts the increased cost of generating muscular force at faster speeds and allows for the invariance in metabolic cost. The methods are impressive, but there is currently only limited evidence for increased tendon stress/strain at faster speeds, and the support for any conclusion metabolic energy expenditure is inadequate.

    2. Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists. The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues. The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds. Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid. The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid. Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront. Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package. Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

    3. Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics. While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals. Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed. Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured) and did not detectibly associate with hopping speed (see results). Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals. These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design. There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate. My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

    4. Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechancial analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:<br /> • It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects? In the literature cited, what was the range of speeds measured, and was it within or between subjects?<br /> • Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost? They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported). Tendon strain could be increasing with ground reaction force, independent of EMA. Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.<br /> • The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested. Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn't exempt the authors from providing the details of their approach.<br /> • Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

    5. Author response:

      Public Reviews:

      We thank the reviewers for their overall positive assessments and constructive feedback

      Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists.

      The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      Thank you for the kind words

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues.

      We will modify the title to reflect this comment.  

      The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.

      Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid.

      The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid.

      Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront.

      Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package.

      Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      You have raised important points, thank you for this feedback. We will add a paragraph discussing the limitations of our study and ensure the revised manuscript makes it clear which mysteries remain. We intend to address muscle forces, contact time, and energetics in future work when we have implemented all hindlimb muscles within the musculoskeletal model.  

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

      We will integrate this into the discussion.

      Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics.

      While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals.

      Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.

      We aimed to provide a joint-level explanation, but we will address the limitations of not modelling the energy consumers themselves (the skeletal muscles) in the revised manuscript. We plan to expand upon muscle level energetics in the future with a more detailed MSK model.

      Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured)…

      As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA.

      It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The medial-lateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable.

      and did not detectibly associate with hopping speed (see results).

      Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals.

      Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would get longer at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we think it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency.

      These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design.

      There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals show a wide range within this. Eight of our 16 kangaroos had a maximum speed that was between 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials.

      It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference such as chasing is dangerous to kangaroos as they are prone to strong adverse reactions to stress.

      There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate.

      We will ensure that this is clearer in the revised manuscript.

      My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechancial analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Thank you!

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:

      • It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects?

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speed within the bounds of what kangaroos are capable of (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is variation hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individual had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We will ensure this is clear in the revised manuscript.

      In the literature cited, what was the range of speeds measured, and was it within or between subjects?

      For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998).

      • Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost?

      They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported).

      We will add supporting literature on the relationship between metabolic cost and tendon stress (or strain), to elaborate on why the correlation between EMA and stress is important.

      Tendon strain could be increasing with ground reaction force, independent of EMA.

      Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.

      Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose. We have illustrated this in Fig 6, however we will make this clearer in the revised discussion.

      • The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested.

      The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We will move the range of speeds from the supplementary material to the results or figure captions. We will add information on the number of trials per kangaroo to the methods.

      We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups (this was for display purposes in our figures, which we will make clearer in the methods).

      Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn't exempt the authors from providing the details of their approach.

      • Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      Thank you for this comment. The bins are used only for display purposes and not within the analysis. In the revised manuscript, we will ensure this is clear.

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster).  

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

      We agree, and in the revised manuscript will incorporate some of the methodological details within the results.

      Author response image 1.

    1. Reviewer #1 (Public Review):

      Summary:

      Major findings or outcomes include a genome for the wasp, characterization of the venom constituents and teratocyte and ovipositor expression profiles, as well as information about Trichopria ecology and parasitism strategies. It was found that Trichopria cannot discriminate among hosts by age, but can identify previously parasitized hosts. The authors also investigated whether superparasitism by Trichopria wasps improved parasitism outcomes (it did), presumably by increasing venom and teratocyte concentrations/densities. Elegant use of Drosophila ectopic expression tools allowed for functional characterization of venom components (Timps), and showed that these proteins are responsible for parasitoid-induced delays in host development. After finding that teratocytes produce a large number of proteases, experiments showed that these contribute to digestion of host tissues for parasite consumption.<br /> The discussion ties these elements together by suggesting that genes used for aiding in parasitism via different parts of the parasitism arsenal arise from gene duplication and shifts in tissue of expression (to venom glands or teratocytes).

      Strengths:

      The strength of this manuscript is that it describes the parasitism strategies used by Trichopria wasps at a molecular and behavioral level with broad strokes. It represents a large amount of work that in previous decades might have been published in several different papers. Including all of these data in a manuscript together makes for a comprehensive and interesting study.

      Weaknesses:

      The weakness is that the breadth of the study results in fairly shallow mechanistic or functional results for any given facet of Trichopria's biology. Although none of the findings are especially novel given results from other parasitoid species in previous publications, integrating results together provides significant information about Trichopria biology.

    2. Reviewer #2 (Public Review):

      Summary:

      Key findings of this research include the sequencing of the wasp's genome, identification of venom constituents and teratocytes, and examination of Trichopria drosophilae (Td)'s ecology and parasitic strategies. It was observed that Td doesn't distinguish between hosts based on age but can recognize previously parasitized hosts. The study also explored whether multiple parasitisms by Td improved outcomes, which indeed it did, possibly by increasing venom and teratocyte levels. Utilizing Drosophila ectopic expression tools, the authors functionally characterized venom components, specifically tissue inhibitors of metalloproteinases (Timps), which were found to cause delays in host development. Additionally, experiments revealed that teratocytes produce numerous proteases, aiding in the digestion of host tissues for parasite consumption. The discussion suggests that genes involved in different aspects of parasitism may arise from gene duplication and shifts in tissue expression to venom glands or teratocytes.

      Strengths:

      This manuscript provides an in-depth and detailed depiction of the parasitic strategies employed by Td wasps, spanning both molecular and behavioral aspects. It consolidates a significant amount of research that, in the past, might have been distributed across multiple papers. By presenting all this data in a single manuscript, it delivers a comprehensive and engaging study that could help future developments in the field of biological control against a major insect pest.

      Weaknesses:

      While none of the findings are particularly groundbreaking, as similar results have been reported for other parasitoid species in prior research, the integration of these results into one comprehensive overview offers valuable biological insights into an interesting new potential biocontrol species.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Major findings or outcomes include a genome for the wasp, characterization of the venom constituents and teratocyte and ovipositor expression profiles, as well as information about Trichopria ecology and parasitism strategies. It was found that Trichopria cannot discriminate among hosts by age, but can identify previously parasitized hosts. The authors also investigated whether superparasitism by Trichopria wasps improved parasitism outcomes (it did), presumably by increasing venom and teratocyte concentrations/densities. Elegant use of Drosophila ectopic expression tools allowed for functional characterization of venom components (Timps), and showed that these proteins are responsible for parasitoid-induced delays in host development. After finding that teratocytes produce a large number of proteases, experiments showed that these contribute to digestion of host tissues for parasite consumption.<br /> The discussion ties these elements together by suggesting that genes used for aiding in parasitism via different parts of the parasitism arsenal arise from gene duplication and shifts in tissue of expression (to venom glands or teratocytes).

      Strengths:

      The strength of this manuscript is that it describes the parasitism strategies used by Trichopria wasps at a molecular and behavioral level with broad strokes. It represents a large amount of work that in previous decades might have been published in several different papers. Including all of these data in a manuscript together makes for a comprehensive and interesting study.

      Weaknesses:

      The weakness is that the breadth of the study results in fairly shallow mechanistic or functional results for any given facet of Trichopria's biology. Although none of the findings are especially novel given results from other parasitoid species in previous publications, integrating results together provides significant information about Trichopria biology.

      We thank the reviewer for appreciating the importance of our study.

      Reviewer #2 (Public Review):

      Summary:

      Key findings of this research include the sequencing of the wasp's genome, identification of venom constituents and teratocytes, and examination of Trichopria drosophilae (Td)'s ecology and parasitic strategies. It was observed that Td doesn't distinguish between hosts based on age but can recognize previously parasitized hosts. The study also explored whether multiple parasitisms by Td improved outcomes, which indeed it did, possibly by increasing venom and teratocyte levels. Utilizing Drosophila ectopic expression tools, the authors functionally characterized venom components, specifically tissue inhibitors of metalloproteinases (Timps), which were found to cause delays in host development. Additionally, experiments revealed that teratocytes produce numerous proteases, aiding in the digestion of host tissues for parasite consumption. The discussion suggests that genes involved in different aspects of parasitism may arise from gene duplication and shifts in tissue expression to venom glands or teratocytes.

      Strengths:

      This manuscript provides an in-depth and detailed depiction of the parasitic strategies employed by Td wasps, spanning both molecular and behavioral aspects. It consolidates a significant amount of research that, in the past, might have been distributed across multiple papers. By presenting all this data in a single manuscript, it delivers a comprehensive and engaging study that could help future developments in the field of biological control against a major insect pest.

      Weaknesses:

      While none of the findings are particularly groundbreaking, as similar results have been reported for other parasitoid species in prior research, the integration of these results into one comprehensive overview offers valuable biological insights into an interesting new potential biocontrol species.

      We thank the reviewer for appreciating the importance of our study and for the suggestions on how to improve it.

      Reviewer #1 (Recommendations For The Authors):

      No additional comments

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      Line 68 : would be better to spell out the name of the genus at first mention of the species

      It has been corrected as suggested.

      Lines 90-92 : This statement does to coincide with the figure. Could you please explain this better?

      We have carefully checked the statement and the corresponding figure panels, but failed to find the disparity between them. Perhaps, the similar and neighboring labels of Dsuz and Dsan might cause confusion of the emergence rates. To further avoid this potential, we have modified fig.1b and 1c by highlighting the focal host Dsuz.

      Lines 124: could you tell the mention of these genes (Piwi) is important in this context, particularly, for non- full-on experts in this field?

      A previous study has revealed the relationship between the expansion of piwi and large genome, we meant to report a different pattern in our focal genome. We understand your confusion might be caused by the inserted statement regarding the repeat that separated them. Thus, we have moved the citation of previous finding to the place immediately precedent to the conclusion.

      Line 233: "...composition remains largely unknown.." for Td or in general? Not clear..

      Thank you. To make it clear, we have modified this sentence as “Although teratocytes have been reported in several other parasitoids, their molecular composition remains largely unknown in general”.

      Line 286: "at a certain time".. confusing, please rephrase.

      We have rephrased it as “After a certain time (2 or 4 hours for oviposition choice)”.

      Line 293-294: I find this sentence quite hard to follow. Could you please rephrase it and/or expand this concept to make it clearer?

      We have modified this sentence as “The parasitic success of Td largely relies on locating a young host; however, Td does not have the ability to discriminate between young and old hosts. Whether Td has evolved any adaptive strategies to compensate for this disadvantage?”

      Line 314: "it would be interesting".. this is too weak of an argument. Please corroborate your motivation more soundly.

      We have changed this statement as “Because Td allows conditional intraspecific competition, the next compelling question would be whether Td allows interspecific competition with larval parasitoids.”

      Line 391: Divergent evolution is too of a big word in this context. I would tune it down to something like: "Studying ecological niche differentiation ".

      Thank you. It has been corrected as suggested.

    1. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Huang et al have investigated the exercise mimetic role of Eugenol (a natural product) in skeletal muscle and whole-body fitness. The authors report that Eugenol facilitates skeletal muscle remodeling to a slower/oxidative phenotype typically associated with endurance. Eugenol also remodels the fat driving browning the WAT. In both skeletal muscle and fat Eugenol promotes oxidative capacity and mitochondrial biogenesis markers. Eugenol also improves exercise tolerance in a swimming test. Through a series of in vitro studies the authors demonstrate that eugenol may function through the trpv1 channel, Ca mobilization, and activation of CaN/NFAT signaling in the skeletal muscle to regulate slow-twitch phenotype. In addition, Eugenol also induces several myokines but mainly IL-15 through which it may exert its exercise mimetic effects. Overall, the manuscript is well-written, but there are several mechanistic gaps, physiological characterization is limited, and some data are mostly co-relative without vigorous testing (e.g. link between Eugenol, IL15 induction, and endurance). Specific major concerns are listed below.

      Strengths:

      A natural product activator of the TRPV1 channel that could elicit exercise-like effects through skeletal muscle remodeling. Potential for discovering other mechanisms of action of Eugenol.

      Weaknesses:

      (1) Figure 1: Histomorphological analysis using immunostaining for type I, IIA, IIX, and IIB should be performed and quantified across different muscle groups and also in the soleus. Fiber type switch measured based on qPCR and Westerns does not sufficiently indicate the extent of fiber type switch. Better images for Fig. 1c should be provided.

      (2) Figure 2: Histomorphological analysis for SDH and NADH-TR should be performed and quantified in different muscle groups. Seahorse or oroborous respirometry experiments should be performed to determine the actually increase in mitochondrial respiratory capacity either in isolated mitochondria or single fibers from vehicle and Eugenol-treated mice. Em for mitochondrial should be added to determine the extent of mitochondrial remodeling. The current data is insufficient to indicate the extent of mitochondrial or oxidative remodeling.

      (3) Figure 2: Gene expression analysis is limited to a few transcriptional factors. A thorough analysis of gene expression through RNA-seq should be performed to get an unbiased effect of Eugenol on muscle transcriptome. This is especially important because eugenol is proposed to work through CaN/NFAT signaling, major transcriptional regulators of muscle phenotype.

      (4) I suggest the inclusion of additional exercise or performance testing including treadmill running, wheel running, and tensiometry. Quantification with a swimming test and measurement of the exact intensity of exercise, etc. is limited.

      (5) In addition to muscle performance, whole-body metabolic/energy homeostatic effects should also be measured to determine a potential increase in aerobic metabolism over anaerobic metabolism.

      (6) For the swimming test and other measurements, only 4 weeks of vehicle vs. Eugenol treatment was used. For this type of pharmacological study, a time course should be performed to determine the saturation point of the effect. Does exercise tolerance progressively increase with time?

      (7) The authors should also consider measuring adaptation to exercise training with or without Eugenol.

      (8) Histomorphological analysis of Wat is also lacking. EchoMRI would give a better picture of lean and fat mass.

      (9) The experiments performed to demonstrate that Eugenol functions through trpv1 are mostly correlational. Some experiments are needed with trpv1 KO or KD instead of inhibitor. Similarly, KD for other trpv channels should be tested (at least 1-4 that seem to be expressed in the muscle). Triple KO or trpv null cells should be considered to demonstrate that eugenol does not have another biological target.

      (10) Eugenol + trpv1 inhibition studies are performed in c2c12 cells and only looks at myofiber genes expression. This is incomplete. Some studies in mitochondrial and oxsphos genes should be done.

      (11) The experiments linking Eugenol to ca handling, and calcineurin/nfat activation are all performed in c2c12 cells. There seems to be a link between Eugenol activation and CaN/NFAT activation and fiber type regulation in cells, however, this needs to be tested in mouse studies at the functional level using some of the parameters measured in aims 1 and 2.

      (12) The myokine studies are incomplete. The authors show a link between Eugenol treatment and myokines/IL-15 induction. However, this is purely co-relational, without any experiments performed to show whether IL-15 mediates any of the effects of eugenol in mice.

      (13) An additional major concern is that it cannot be ruled out that Engenol is uniquely mediating its effects through trpv1. Ideally, muscle-specific trpv1 mice should be used to perform some experiments with Eugenol to confirm that this ion channel is involved in the physiological effects of eugenol.

      Comments on revised version:

      Unfortunately, in the revision the authors have not addressed any of my comments with new experimental data. For example, some of the histological experiments I suggested are quite easy to do or standardize. Other in vitro experiments could also be conducted to show direct mechanistic link. The current revision does not further improve the manuscript from the 1st submission.

    2. Reviewer #2 (Public Review):

      Summary:

      The authors examined the hypothesis that eugenol promotes myokines release and f skeletal muscles remoulding by activating the TRPV1-Ca2+-calcineurin-NFATc1 signalling pathway. They first showed that eugenol promotes skeletal muscle transformation and metabolic functions in adipose tissues by analysing changes in the expression of mRNA and proteins of relevant representative protein markers. With similar methodologies, they further found that eugenol increases the expression of mRNA and/or proteins of TRPV1, CaN, NFATC1 and IL-15 in muscle tissues. These processes were, however, prevented by inhibiting TRPV1 and CaN. Similar expression changes were also triggered by increasing intracellular Ca2+ with A23187, suggesting a Ca2+-dependent process.

      Strengths:

      Different proteins markers were used as a readout of the functions of muscles and adipose tissues and mitochondria and analysed at both mRNA and protein levels. The results were mostly consistent. Although the signaling pathway of TRPV1-Ca2+-CaN-NFAT is not new and well documented, they identified IL-15 as a new downstream target of this pathway combined with use of TRPV1 and CaN inhibitors.

      Weaknesses:

      Most of the evidence is limited to the molecular level lacking direct functional assays and system analysis. It will be interesting to examine the effect of eugenol on metabolic rate in animals and the role of TRPV1 in this process, as eugenol enhanced food intake without effect on body weight. TRPV1 and CaN inhibition prevented IL-15 expression in C2C12 cells (Fig.9). It remain unknown whether the effect is reproducible in native muscle tissues.

      It is also unknown how eugenol enhances TRPV1/CaN expression and alters the expression of many other protein markers in muscle and adipose tissues. Are these effects mediated by activated NFAT or by released IL-15 forming a positive feedback loop? It should at least be discussed.

      Many protein blots were presented but no molecular weight markers were shown. It is thus difficult to convince others that the protein bands are the right anticipated positions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      (1) Figure 1: Histomorphological analysis using immunostaining for type I, IIA, IIX, and IIB should be performed and quantified across different muscle groups and also in the soleus. Fiber type switch measured based on qPCR and Westerns does not sufficiently indicate the extent of fiber type switch. Better images for Fig. 1c should be provided.

      Thanks for your suggestion. In fact, we attempted immunofluorescent staining for Slow MyHC and Fast MyHC in GAS muscle. However, for the majority of our results, we only observed positive expression of Slow MyHC in a small portion of the muscle sections (as shown in the figure below), so we did not present this result.

      In addition, due to the size limitations on uploading image files to Biorxiv, we had to compress the images, resulting in lower resolution pictures. We have attempted to submit clearer images in Fig. 1C

      Author response image 1.

      Green: Slow MyHC; Red: Fast MyHC

      (2) Figure 2: Histomorphological analysis for SDH and NADH-TR should be performed and quantified in different muscle groups. Seahorse or oroborous respirometry experiments should be performed to determine the actually increase in mitochondrial respiratory capacity either in isolated mitochondria or single fibers from vehicle and Eugenol-treated mice. Em for mitochondrial should be added to determine the extent of mitochondrial remodeling. The current data is insufficient to indicate the extent of mitochondrial or oxidative remodeling.

      That's a good suggestion. However, we regret to inform you that we are unable to present these results due to a lack of relevant experimental equipment and samples.

      (3) Figure 2: Gene expression analysis is limited to a few transcriptional factors. A thorough analysis of gene expression through RNA-seq should be performed to get an unbiased effect of Eugenol on muscle transcriptome. This is especially important because eugenol is proposed to work through CaN/NFAT signaling, major transcriptional regulators of muscle phenotype.

      Thanks for your suggestion. Indeed, we believe that in terms of reliability and accuracy, RNA-seq is not as good as RT-qPCR. The advantage of RNA-seq lies in its high throughput, making it suitable for screening unknown transcription factor regulatory mechanisms. In this study, the signaling pathways regulating myokines and muscle fiber type transformation are known and limited, with only the CaN/NFATc1 and the AMPK pathway. Since eugenol mainly acts through the Ca2+ pathway, we primarily focus on the CaN/NFATc1 signaling pathway.

      (4) I suggest the inclusion of additional exercise or performance testing including treadmill running, wheel running, and tensiometry. Quantification with a swimming test and measurement of the exact intensity of exercise, etc. is limited.

      That's a good suggestion. We apologize for being unable to detect this indicator due to a lack of relevant experimental equipment.

      (5) In addition to muscle performance, whole-body metabolic/energy homeostatic effects should also be measured to determine a potential increase in aerobic metabolism over anaerobic metabolism.

      That's a good suggestion. We apologize for being unable to detect this indicator due to a lack of relevant experimental equipment.

      (6) For the swimming test and other measurements, only 4 weeks of vehicle vs. Eugenol treatment was used. For this type of pharmacological study, a time course should be performed to determine the saturation point of the effect. Does exercise tolerance progressively increase with time?

      Thanks for your suggestion. Due to the potential damage that exhaustive swimming tests inflict on mice, the tested mice are subsequently eliminated to avoid potential interference with the experiment. Therefore, this experiment is only suitable for conducting tests at individual time points.

      (7) The authors should also consider measuring adaptation to exercise training with or without Eugenol.

      Thanks for your suggestion. The purpose of this study is to investigate whether eugenol mimics exercise under standard dietary conditions. In our future research, we will consider exploring the effects of eugenol under HFD and exercise conditions.

      (8) Histomorphological analysis of Wat is also lacking. EchoMRI would give a better picture of lean and fat mass.

      That's a good suggestion. However, we did not collect the slices of WAT tissue, so we are unable to supplement this result, we feel sorry for it. In addition, we apologize for being unable to detect lean and fat mass due to a lack of EchoMRI equipment.

      (9) The experiments performed to demonstrate that Eugenol functions through trpv1 are mostly correlational. Some experiments are needed with trpv1 KO or KD instead of inhibitor. Similarly, KD for other trpv channels should be tested (at least 1-4 that seem to be expressed in the muscle). Triple KO or trpv null cells should be considered to demonstrate that eugenol does not have another biological target.

      Thanks for your professional suggestion. AMG-517 is a specific inhibitor of TRPV1, with a much greater inhibitory effect on TRPV1 compared to other TRP channels. AMG-517 inhibits capsaicin (500 nM), acid (pH 5.0), or heat (45°C) induced Ca2+ influx in cells expressing human TRPV1, with IC50 values of 0.76 nM, 0.62 nM, and 1.3 nM, respectively. However, the IC50 values of AMG-517 for recombinant TRPV2, TRPV3, TRPV4, TRPA1, and TRPM8 cells are >20 μM (Gavva, 2008). Therefore, we believe that using AMG-517 instead of TRPV1 KO cells is sufficient to demonstrate the involvement of TRPV1 in the function of eugenol.

      While this study did not exclude the possibility of other TRP channels' involvement, it was based on the fact that eugenol does not promote mRNA expression of other TRP channels, as shown in Fig4A-C. Indeed, as far as we know, besides TRPV1, the effects of other TRP channels on myofiber type transformation remain unknown. This is an aspect that we plan to investigate in the future.

      Reference

      Gavva NR, Treanor JJ, Garami A, et al. Pharmacological blockade of the vanilloid receptor TRPV1 elicits marked hyperthermia in humans. Pain. 2008;136(1-2):202-210.

      (10) Eugenol + trpv1 inhibition studies are performed in c2c12 cells and only looks at myofiber genes expression. This is incomplete. Some studies in mitochondrial and oxsphos genes should be done.

      Thanks for your suggestion. In the inhibition experiment, we additionally examined the expression of mitochondrial complex proteins as shown in Figure 5C. And the relevant description has been added in lines 178-183 and 764-765.

      (11) The experiments linking Eugenol to ca handling, and calcineurin/nfat activation are all performed in c2c12 cells. There seems to be a link between Eugenol activation and CaN/NFAT activation and fiber type regulation in cells, however, this needs to be tested in mouse studies at the functional level using some of the parameters measured in aims 1 and 2.

      Thank you for your professional suggestion. We will attempt to continue these experiments in future studies.

      (12) The myokine studies are incomplete. The authors show a link between Eugenol treatment and myokines/IL-15 induction. However, this is purely co-relational, without any experiments performed to show whether IL-15 mediates any of the effects of eugenol in mice.

      Indeed, previous studies have adequately demonstrated the regulation of skeletal muscle oxidative metabolism by IL-15. The initial aim of this experiment was to investigate the mechanism by which eugenol promotes IL-15 expression. Through inhibition assays, EMSA, and dual luciferase reporter gene experiments, we have thoroughly demonstrated that eugenol promotes IL-15 expression via the CaN/NFATc1 signaling pathway, thus establishing a novel link between CaN/NFATc1 signaling and the myokine IL-15 expression. In the subsequent experiments, we plan to knock out IL-15 in eugenol-treated C2C12 cells to explore whether IL-15 mediates the effects of eugenol. This will be another aspect of our investigation.

      (13) An additional major concern is that it cannot be ruled out that Engenol is uniquely mediating its effects through trpv1. Ideally, muscle-specific trpv1 mice should be used to perform some experiments with Eugenol to confirm that this ion channel is involved in the physiological effects of eugenol.

      As you suggested, we agree that muscle-specific TRPV1 mice should be used to conduct some experiments with eugenol. In our mice experiments, due to the lack of validation of skeletal muscle-specific TRPV1 knockout, we indeed cannot rule out that eugenol is uniquely mediating its effects through TRPV1. We acknowledge this as a limitation of our study. However, due to limitations in research funding and time, we are currently unable to supplement these experiments. Nevertheless, we believe that our results from in vitro experiments using a TRPV1 inhibitor (which selectively inhibits TRPV1) provide evidence of eugenol's action through TRPV1.

      Reviewer #2 (Public Review):

      Weaknesses:

      (1) Apart from Fig.2A and 2B, they mostly utilised protein expression changes as an index of tissue functional changes. Most of the data supporting the conclusions are thus rather indirect. More direct functional evidence would be more compelling. For example, a lipolysis assay could be used to measure the metabolic function of adipocytes after eugenol treatment in Fig.3. Functional activation of NFAT can be demonstrated by examining the nuclear translocation of NFAT.

      Thank you for your professional suggestion. Indeed, as shown in Figure 4G-I, we detected the expression of NFATc1 in the nucleus to illustrate its nuclear translocation.

      (2) To further demonstrate the role of TRPV1 channels in the effects of eugenol, TRPV1-deficient mice and tissues could also be used. Will the improved swimming test in Fig. 2B and increased CaN, NFAT, and IL-15 triggered by eugenol be all prevented in TRPV1-lacking mice and tissues?

      Thank you for your professional suggestion. We agree that muscle-specific TRPV1 mice should be used to conduct some experiments with eugenol. However, due to limitations in research funding and time, we are currently unable to supplement these experiments.

      (3) Direct evidence of eugenol activation of TRPV1 channels in skeletal muscles is also lacking. The flow cytometry assay was used to measure Ca2+ changes in the C2C12 cell line in Fig. 5A. But this assay is rather indirect. It would be more convincing to monitor real-time activation of TRPV1 channels in skeletal muscles not in cell lines using Ca2+ imaging or electrophysiology.

      Thank you for your professional suggestion. As you suggested, we initially planned to use patch-clamp technique to detect membrane potential changes in skeletal muscle cells under eugenol treatment. However, due to experimental technical limitations, this experiment was not successfully conducted. Therefore, we were compelled to rely solely on flow cytometry to detect Ca2+ levels.

      Reviewer #2 (Recommendations For The Authors):

      (1) Most of the mRNA and protein data are consistent with each other. However, some of them are not obvious. For example, PGC1a mRNA was increased by eugenol in Fig. 2C but not seen in protein in Fig. 2D. Similarly, Complex I and V mRNA was increased in Fig. 2C but not obvious at protein levels in Fig. 2D, even though they claimed that Complex I and V were both upregulated by eugenol (see: line 123). Another example: IL-15 mRNA was increased by EUG100 but not by EUG50 in the GAS muscle in Fig. 8A. However, EUG50 increased IL-15 protein expression in Fig. 8B. Similar conflict was also seen in IL-15 expression in the TA muscle in Fig. 8A and 8C.

      Thanks for your question. As shown in the table below, by standardizing with β-Actin, our statistical data indeed indicate that eugenol promotes the expression of Complex I and V proteins (although the upregulation is minimal). Additionally, protein and mRNA expression do not always correlate, which may be due to potential post-transcriptional and post-translational regulation.

      Author response table 1.

      (2) Line 115: Figure 2A should be Figure 2B; Line 119: Figure 2B should be Figure 2A. Alternatively, swap Fig2A with Fig. 2B.

      Thanks for your correction, we have revised the relevant content in lines 111-113 and 724-725.

      (3) Abbreviations of ADF and ADG in Fig. 3A should be defined.

      Thank you for your suggestion. We have defined these abbreviations in lines 123-125.

      (4) Line 154: TRPV1 mRNA expression was promoted by 25 and 50uM eugenol, not by 12.5uM.

      Thank you for your correction. We have revised it in line 150.

      (5) Line 173: Increased expression of NFAT suggests that NFAT is activated. This is a rather weak statement. It is more convincing to show the nuclear translocation of NFAT by eugenol treatment.

      Thank you for your correction. We have revised the describtion in line 166.

      (6) Line 185: The data showing EUG increased slow MyHC fluorescence intensity in Fig. 5D are not clear at all. Quantification is required.

      Thank you for your suggestion. We have attempted to submit clearer images in Figure 5E, and the quantification have been provided.

      (7) Line 235: IL-15 expression is positively correlated with MyHC IIa, suggesting IL-15 is a slow muscle myokine (See line 2398). However, MyHC IIa is a marker of fast muscle fibres (see line 50).

      Thank you for your correction. As you pointed, MyHC IIa is fast-twitch oxidative muscle fiber. We have replaced ‘slow’ with ‘oxidative’ in line 235.

      (8) Fig.9C and 9D show that inhibition of TRPV1 and CaN attenuated the upregulation of IL-15 mRNA and protein by eugenol in C2C12 cell line. This result is important in demonstrating the link of TRPV1 and CaN to IL-15. It will be more interesting and physiologically relevant to perform this experiment in primary skeletal muscle cells isolated from mice.

      Thank you for your suggestion. This is indeed an interesting idea. We will attempt to continue our experiments in mice and primary porcine muscle cells in future studies.

      (9) It is concerning that 4-week-old male mice were used for the study. The 4-week-old mice are immature. Adult mice over 8 weeks should be used. It is thus unknown whether the findings are broadly applicable to adult age.

      Thanks for your professional question. Age indeed has an impact on the muscle fiber type in mammals. Based on previously observed patterns of muscle fiber changes with age in various mammals (Katsumata et al., 2021; Pandorf et al., 2012; Hill et al., 2020), we believe that changes in muscle fiber types occur more frequently in juvenile mammals, mainly manifesting as a sharp increase in fast muscle fibers. Therefore, interventions during the juvenile stage might be more effective in promoting the transformation of fast to slow muscle fibers. As a result, in most of our group's research using nutritional interventions to regulate muscle fiber types, we tend to start interventions from the age of 4 weeks in mice. If we began intervention at 8 weeks, we speculate that the effectiveness would not be as potent as starting at 4 weeks. Below are the patterns of muscle fiber changes with age in various mammalian models, provided for reference:

      (1) Changes in muscle fiber types with age in pigs:

      As shown in the following figure, there is a dramatic change in the muscle fiber types 12 days post birth in pigs, especially with a sharp increase in fast muscle fibers, which continues until day 45. After 45 days of age, the changes in muscle fiber types become relatively gradual.

      Author response table 2.

      Developmental change Of proportions Of muscle fiber types in Longissimus dorsi muscle determined by histochemical analysis for myosin adenosine triphosphatase activity (%)

      Least squares means and pooled standard errors (n = 3). MHC, myosin heavy chain; ND, not detected. *P<0.10, **P<0.01 Least square means followed by different letters on the same row are significantly different (P < 0.05).

      Reference:

      Katsumata, M., Yamaguchi, T., Ishida, A., & Ashihara, A. (2017). Changes in muscle fiber type and expression of mRNA of myosin heavy chain isoforms in porcine muscle during pre- and postnatal development. Animal science journal, 88(2), 364–371.

      (2) Changes in muscle fiber types with age in rats:

      As illustrated in the subsequent figure, the muscle fiber types in rats undergo significant changes before 20 days of age (3-week-old), notably with a pronounced increase in type IIb fast-twitch fibers. After reaching 20 days of age, the changes in type IIb muscle fibers tend to stabilize and become more gradual.

      Author response image 2.

      Reference:

      Pandorf, C. E., Jiang, W., Qin, A. X., Bodell, P. W., Baldwin, K. M., & Haddad, F. (2012). Regulation of an antisense RNA with the transition of neonatal to IIb myosin heavy chain during postnatal development and hypothyroidism in rat skeletal muscle. American journal of physiology. 302(7), R854–R867.

      (3) Changes in muscle fiber types with age in mice:

      As depicted in the following figure, when comparing 10-week-old mice to 78-week-old aged mice, there are no significant changes in muscle fiber types.

      Author response image 3.

      Reference:

      Hill, C., James, R. S., Cox, V. M., Seebacher, F., & Tallis, J. (2020). Age-related changes in isolated mouse skeletal muscle function are dependent on sex, muscle, and contractility mode. American journal of physiology. Regulatory, integrative and comparative physiology, 319(3), R296–R314.

    1. eLife assessment

      This important manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. Compelling evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide new insights for biologists, psychologists, and others studying learning and neurodevelopment.

    2. Reviewer #1 (Public Review):

      Summary:

      Shakhawat et al., investigated how enhancement of plasticity and impairment could result in the same behavioral phenotype. The authors tested the hypothesis that learning impairments result from saturation of plasticity mechanisms and had previously tested this hypothesis using mice lacking two class I major histocompatibility molecules. The current study extends this work by testing the saturation hypothesis in a Purkinje-cell (L7) specific Fmr1 knockout mouse mice, which have enhanced parallel fiber-Purkinje cell LTD. The authors found that L7-Fmr1 knockout mice are impaired on an oculomotor learning task and both pre-training, to reverse LTD, and diazepam, to suppress neural activity, eliminated the deficit when compared to controls.

      Strengths:

      This study tests the "saturation hypothesis" to understand plasticity in learning using a well-known behavior task, VOR, and an additional genetic mouse line with a cerebellar cell-specific target, L7-Fmr1 KO. This hypothesis is of interest to the community as it evokes novel inquisition into LTD that has not been examined previously.

      Utilizing a cell-specific mouse line that has been previously used as a genetic model to study Fragile X syndrome is a unique way to study the role of Purkinje cells and the Fmr1 gene. This increases the understanding in the field in regards to Fragile X syndrome and LTD.

      The VOR task is a classic behavior task that is well understood, therefore using this metric is very reliable for testing new animal models and treatment strategies. The effects of pretraining are clearly robust and this analysis technique could be applied across different behavior data sets.

      The rescue shown using diazepam is very interesting as this is a therapeutic that could be used in clinical populations as it is already approved.

      All previous comments have been addressed with additional studies, explanations, or analyses. These additions strengthen a very impactful study.

      The authors achieved their study objectives and the results strongly support their conclusion and proposed hypothesis. This work will be impactful on the field as it uses a new Purkinje-cell specific mouse model to study a classic cerebellar task. The use of diazepam could be further analyzed in other genetic models of neurodevelopmental disorders to understand if effects on LTD can rescue other pathways and behavior outcomes.

    3. Reviewer #2 (Public Review):

      This manuscript explores the seemingly paradoxical observation that enhanced synaptic plasticity impairs (rather than enhances) certain forms of learning and memory. The central hypothesis is that such impairments arise due to saturation of synaptic plasticity, such that the synaptic plasticity required for learning can no longer be induced. A prior study provided evidence for this hypothesis using transgenic mice that lack major histocompatibility class 1 molecules and show enhanced long-term depression (LTD) at synapses between granule cells and Purkinje cells of the cerebellum. The study found that a form of LTD-dependent motor learning-increasing the gain of the vestibulo-ocular reflex (VOR)-is impaired in these mice and can be rescued by manipulations designed to "unsaturate" LTD. The present study extends this line of investigation to another transgenic mouse line with enhanced LTD, namely, mice with the Fragile X gene knocked out. The main findings are that VOR gain increase learning is selectively impaired in these mice but can be rescued by specific manipulations of visuomotor experience known to reverse cerebellar LTD. Additionally, the authors show that a transient global enhancement of neuronal inhibition also selectively rescues gain increase learning. This latter finding has potential clinical relevance since the drug used to boost inhibition, diazepam, is FDA-approved and commonly used in the clinic. The evidence provided for the saturation is somewhat indirect because directly measuring synaptic strength in vivo is technically difficult. Nevertheless, the experimental results are solid. In particular, the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable.

    4. Author response:

      The following is the authors’ response to the original reviews. 

      eLife assessment<br /> This important manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. Compelling evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      eLife assessment, Significance of findings

      This valuable manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. 

      According to the eLife criteria for assessing significance, the “valuable” assessment indicates “findings that have theoretical or practical implications for a subfield.” We have revised the manuscript to emphasize the “theoretical and practical implications beyond a single subfield” which “substantially advance our understanding of major research questions”, with “profound implications” and the potential for “widespread influence,” the eLife criteria for a designation of “landmark” significance.   

      The most immediate implications of our results are for the two major neuroscience subfields of cerebellar research and autism research. However, as recognized by Reviewer 2, the implications are much broader than that: “the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.” We have substantially revised the Discussion section of the manuscript to more explicitly lay out how the central idea of our manuscript-- that the capacity for learning at any given moment is powerfully influenced by dynamic, activity- and plasticity-dependent changes in the threshold for synaptic plasticity over short timescales of tens of minutes to hours --has implications for scientific thinking and experiments on plasticity and learning throughout the brain, as well as clinical practice for a wide array of brain disorders associated with altered plasticity and learning impairment. 

      To emphasize the broad conceptual implications of our research, we have reframed our conclusions in terms of metaplasticity rather than saturation of plasticity throughout the revised manuscript. In our previous submission, we had used the “saturation “ terminology for continuity with our previous NguyenVu et al 2017 eLife paper, and mentioned the related idea of threshold metaplasticity in a single sentence: “Similarly, the aberrant recruitment of LTD before training may lead, not to its saturation per se, but to some other kind of reduced availability, such as an increased threshold for its induction (Bienenstock, Cooper, and Munro, 1982; Leet, Bear, and Gaier, 2022).” However, we now appreciate that metaplasticity is a more general conceptual framework for our findings, and therefore emphasize this concept in the revised manuscript, while still making the conceptual link with the “saturation” idea presented in NguyenVu et al 2017 (lines 236-238). 

      The concept of a sliding threshold for synaptic plasticity (threshold metaplasticity) was proposed four decades ago by Bienenstock, Cooper and Munro (1982) as a mechanism for countering an instability inherent in Hebbian plasticity whereby correlated pre- and post-synaptic activity strengthens a synapse, which leads to an increase in correlated activity, which in turn leads to further strengthening. To counter this, BCM proposed a sliding threshold whereby increases in neural activity increase the threshold for LTP and decreases in activity decrease the threshold for LTP, thereby providing a mechanism for stabilizing firing rates and synaptic weights. This BCM sliding threshold model has been highly influential in theoretical and computational neuroscience, but experimental evidence for whether and how such a mechanism functions in vivo has been quite limited.  

      Our work extends the previous, limited experimental evidence for a BCM-like sliding threshold in vivo in several significant ways, which we now discuss in the revised manuscript:

      First, we analyze threshold metaplasticity at synapses where the plasticity is not Hebbian and lacks the inherent instability that inspired the BCM model. The synapses onto cerebellar Purkinje cells have been described as “anti-Hebbian” because the associative form of plasticity is synaptic LTD of excitatory inputs. This anti-Hebbian associative plasticity lacks the instability inherent in Hebbian plasticity. Moreover, a BCM-like sliding threshold that increases the threshold for associative LTD with increased firing rates and decreases threshold for LTD with decreased firing rates would tend to oppose rather than support the stability of firing rates, nevertheless we find evidence for this in our experimental results. Thus, for cerebellar LTD, the central function of the sliding threshold may not be the stabilization of firing rates, but rather to limit plasticity in order to suppress the overwrite of new memories or to allocate different memories to the synapses of different Purkinje cells. 

      Second, we analyze the influence of a BCM-like sliding threshold for plasticity on behavioral learning. Most previous evidence for the BCM model in vivo has derived from studies of the effects of sensory deprivation (e.g., monocular occlusion) on the functional connectivity of sensory circuits (Kirkwood et al., 1996; Desai et al. 2002; Fong et al., 2021) rather than on learning per se.  

      Third, our results provide evidence for major changes in the threshold for plasticity over short time scales and with more subtle manipulations of neural activity than used in previous studies, with practical implications for clinical application. Previously, metaplasticity has been demonstrated with sensory deprivation over multiple days (Kirkwood et al., 1996; Desai et al. 2002) or with drastic changes in neural activity, such as with TTX in the retina (Fong et al, 2021), TMS (Hamada et al 2008), or high frequency electrical stimulation in vitro (Holland & Wagner 1998; Montgomery & Madison 2002) or in vivo (Abraham et al 2001). In contrast, we provide evidence for metaplasticity induced by 30 min of behavioral manipulation (pre-training) and by the relatively subtle pharmacological manipulation of activity with systemic administration of diazepam, a drug approved for humans. Thus, our work contributes not only conceptually to understanding the function of threshold metaplasticity in vivo, but also offers practical observations that could pave the way for novel therapeutic interventions.  

      Fourth, whereas efforts to enhance plasticity and learning have largely focused on increasing the excitability of neurons during learning to help cross the threshold for plasticity (e.g., Albergaria et al., 2018; Yamaguchi et al., 2020; Le Friec et al., 2017), we take the opposite, somewhat counterintuitive approach of inhibiting the excitability of neurons during a period before learning to reset the threshold for plasticity to a state compatible with new learning. To our knowledge, the only other application of such an approach in an animal model of a brain disorder has been inhibiting peripheral (retinal) activity with TTX for treatment of amblyopia (Fong et al, 2021). Our findings from CNS inhibition with a single systemic dose of diazepam greatly expands the potential applications, which could readily be tested in other mouse models of human disorders, and other learning deficits. Even in cases where the specific synaptic impairments and circuitry are less fully understood, the impact of suppressing neural activity during a period before training to reduce the threshold for plasticity could be empirically tested.  

      Fifth, our work extends the consideration of a BCM-like sliding threshold for plasticity to the cerebellum, whereas previous work has focused on models and experimental studies of forebrain circuits. Currently there is a surge of interest in the contribution of the cerebellum to functions and brain disorders previously ascribed to forebrain, hence we anticipate broad interest in this work. 

      Sixth, our results suggest that the history of plasticity rather than the history of firing rates may be the homeostat controlling the threshold for plasticity, at least at the synapses under consideration. Diazepam pre-treatment only enhanced learning in the L7-Fmr1 KO mice with a low “baseline” threshold for plasticity, as measured in vitro, and not WT mice. This suggests it is not the neural activity per se that drives the change in threshold for plasticity, but the interaction of activity with the plasticity mechanism.

      In the revised Discussion, we make all of the above points, to make the implications more clear to readers.  

      The broad interest in this topic is illustrated by two concrete examples. First, an abstract of this work was honored with selection for oral presentation at the November 2023 Symposium of the Molecular and Cellular Cognition Society, a conceptually wide-ranging organization with thousands of members worldwide. Second, the most closely related published work on activity-dependent metaplasticity in vivo, the Fong et al 2021 eLife paper demonstrating reversal of amblyopia by suppression of activity in the retina by TTX, attracted such broad interest, not just of professional scientists, but also the general public, as to be reported on National Public Radio’s All Things Considered, with an audience of 11.9 million people worldwide.  

      In considering the potential of this work for widespread influence, it is important to note that activitydriven changes in the threshold for plasticity could very well be a general property of most if not all synapses, yet very little is known about its function in vivo, especially during learning.  Therefore, the seminal conceptual and practical advances described above have the potential for profound implications throughout neuroscience, psychiatry, neurology and computer science/AI, the eLife criterion for designation as “landmark” in significance. We respectfully request that the reviewers and editor reassess the significance of our findings in light of our much-improved discussion of the broad significance of the work.

      eLife assessment, Strength of support

      Convincing evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      The designation of “Convincing” indicates “methodology in line with current state-of the-art.” In the revised Discussion, we more clearly highlight that our evidence is “more rigorous than current state-ofthe-art” in several respects, thereby meeting the eLife criterion for “Compelling”:

      (1) Comparison of learning deficits and effects of behavioral and pharmacological pretreatment across five closely related oculomotor learning tasks, which all depend on the same region of the cerebellum (the flocculus), but which previous work has found to vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. 

      The “state-of-the-art” behavioral standard in the field of learning is assessment of a single learning task that depends on a given brain area, with the implicit or explicit assumption that the task chosen is representative of “cerebellum-dependent learning” or hippocampus-, amygdala-, basal ganglia-, cortex- dependent learning, etc. Sometimes there is a no-learning behavioral control. 

      Our study exceeds this standard by comparing across many different closely related learning tasks, which all depend on the cerebellar flocculus and other shared vestibular, visual, and oculomotor circuitry, but vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. In the original submission, we reported results for high-frequency VOR-increase learning that were dramatically different than for three other VOR learning tasks for which there is less evidence for a role of LTD. Reviewer 2 noted, “the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable.” In the revised manuscript, we provide new data for a second oculomotor learning task in which LTD has been implicated, OKR adaptation, with very similar results as for high-frequency VORincrease learning. The remarkable specificity of both the learning deficits and the effects of pre-training manipulations, in two different lines of mice, for the two specific learning tasks in which LTD has been most strongly implicated, and not the other three oculomotor learning tasks, substantially strengthens the evidence for the conclusion that the learning deficits and effects of pre-training are related specifically to the lower threshold for LTD, rather than the result of some other effect of the gene KO or pre-treatment on the cerebellar or oculomotor circuitry (discussed on lines 270-290 of revised manuscript). 

      (2) Replication of findings in more than one line of mice, targeting distinct signaling pathways, with a common impact of enhancing LTD at the cerebellar PF-Purkinje cell synapses.  

      State-of-the-art is to report the effects of one specific molecular signaling pathway on behavior. 

      In the first part of this Research Advance, we replicate the findings of Nguyen-Vu et al 2017 for a completely different line of mice with enhanced LTD at the parallel fiber-to-Purkinje cell synapses. Like the comparison across LTD-dependent and LTD-independent oculomotor learning tasks, the comparison across completely different lines of mice with enhanced LTD strengthens the evidence that the shared behavioral phenotypes are a reflection of the state of LTD rather than other “off-target” effects of each mutation (discussed on lines 291-309 of revised manuscript).

      (3) Reversal of learning impairments with more than one type of treatment. 

      State-of-the-art is to be able to reverse a learning deficit or other functional impairment in an animal model of a brain disorder with a single treatment; indeed, success in this respect is viewed as wildly exciting, as evidenced by the reception by the scientific and lay communities of the Fong et al, 2021 eLife report of reversal of amblyopia by TTX treatment of the retina. 

      In the current work, we demonstrate reversal of learning deficits with two different types of treatment during the period before training, one behavioral and one pharmacological. The current diazepam pretreatment results provide a fundamentally new type of evidence for the hypothesis that the threshold for LTD and LTD-dependent learning varies with the recent history of activity in the circuit, complementing the evidence from behavioral and optogenetic pre-training approaches used previously in Nguyen-Vu et al, 2017 (discussed on lines 151-158 and 246-255 of revised manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Shakhawat et al., investigated how enhancement of plasticity and impairment could result in the same behavioral phenotype. The authors tested the hypothesis that learning impairments result from saturation of plasticity mechanisms and had previously tested this hypothesis using mice lacking two class I major histocompatibility molecules. The current study extends this work by testing the saturation hypothesis in a Purkinje-cell (L7) specific Fmr1 knockout mouse mice, which have enhanced parallel fiber-Purkinje cell LTD. The authors found that L7-Fmr1 knockout mice are impaired on an oculomotor learning task and both pre-training, to reverse LTD, and diazepam, to suppress neural activity, eliminated the deficit when compared to controls.

      Strengths:

      This study tests the "saturation hypothesis" to understand plasticity in learning using a well-known behavior task, VOR, and an additional genetic mouse line with a cerebellar cell-specific target, L7-Fmr1 KO. This hypothesis is of interest to the community as it evokes a novel inquisition into LTD that has not been examined previously.

      Utilizing a cell-specific mouse line that has been previously used as a genetic model to study Fragile X syndrome is a unique way to study the role of Purkinje cells and the Fmr1 gene. This increases the understanding in the field in regards to Fragile X syndrome and LTD.

      The VOR task is a classic behavior task that is well understood, therefore using this metric is very reliable for testing new animal models and treatment strategies. The effects of pretraining are clearly robust and this analysis technique could be applied across different behavior data sets.

      The rescue shown using diazepam is very interesting as this is a therapeutic that could be used in clinical populations as it is already approved.

      There was a proper use of controls and all animal information was described. The statistical analysis and figures are clear and well describe the results.

      We thank the reviewer for summarizing the main strengths of our original submission. We have further strengthened the revised submission by 

      (1) more fully discussing the broad conceptual implications, as outlined above; 

      (2) adding additional new data (Fig. 5) showing that another LTD-dependent oculomotor learning task, optokinetic reflex (OKR) adaptation, is impaired in the L7-Fmr1 KO mice and rescued by pre-treatment with diazepam, as we had already shown for high-frequency VOR increase learning;  3) responding to the specific points raised by the reviewers, as detailed below.

      Weaknesses:

      While the proposed hypothesis is tested using genetic animal models and the VOR task, LTD itself is not measured. This study would have benefited from a direct analysis of LTD in the cerebellar cortex in the proposed circuits.

      Our current experiments were motivated by the direct analysis of cerebellar LTD in Fmr1 knock out mice that was already published (Koekkoek et al., 2005). In that previous work, LTD was analyzed in both Purkinje cell selective L7-Fmr1 KO mice (Koekkoek et al., 2005; Fig. 4D), as used in our study, and global Fmr1 knock out mice (Koekkoek et al., 2005; Fig. 4B). Both lines were found to have enhanced LTD, as cited in the Introduction of our manuscript (lines 48-51, 63-64). The goal of our current study was to build on this previous work by analyzing the behavioral correlates of the findings from this previous, direct analysis of LTD. 

      Diazepam was shown to rescue learning in L7-Fmr1 KO mice, but this drug is a benzodiazepine and can cause a physical dependence. While the concentrations used in this study were quite low and animals were dosed acutely, potential side-effects of the drug were not examined, including any possible withdrawal. 

      In humans, diazepam (valium) is one of the most frequently prescribed drugs in the world, and the side effects and withdrawal symptoms have been extensively studied and documented.1 Withdrawal symptoms are generally not observed with treatments of less than 2 weeks (Brett and Murnion, 2015). After longterm treatments tapering of the dosage is recommended to mitigate withdrawal (Brett and Murnion, 2015 and https://americanaddictioncenters.org/valium-treatment/withdrawal-duration). The extensive data on the safety of diazepam in humans lowers the barrier to potential clinical translation of our basic science findings, although we emphasize that our own expertise is scientific, and translation to Fragile X patients or other patient groups will require additional development of the research by clinicians.

      Given the extensive history of research on this drug, we focused on looking for side effects that would reflect an adverse effect of diazepam on the function of the same oculomotor neural circuitry whose ability to support certain oculomotor learning tasks was improved after diazepam. In other words, we assessed whether the pharmacological manipulation was enhancing certain functions of a given circuit at the expense of others. As we note (line 164), “The acute effect of diazepam administration [measured 2 hours after administration] was to impair learning” in both WT and L7-Fmr1 KO mice. One could consider this a side effect. More importantly, we also tested extensively for oculomotor side-effects during the therapeutic period when learning impairments were eliminated in the L7-Fmr1 KOs, 18-24 hours post-administration, and have a full section of the Results describing our findings about this, titled “Specificity of pre-training effects on learning.” As described in the Results and Discussion (lines 184195, 312-318, Figure 3, figure 3-supplement1; figure 4B; figure 5-supplement 1), we found no such adverse side-effects, which is again encouraging with respect to the translational potential of our findings. 

      This drug is not specific to Purkinje cells or cerebellar circuits, so the action of the drug on cerebellar circuitry is not well understood for the study presented.

      The effects of diazepam are indeed not specific to Purkinje cells, but rather are known to be widespread. Diazepam is a positive allosteric modulator of GABAA receptors, which are found throughout the brain, including the cerebellum. When delivered systemically, as we did in our experiments, diazepam will suppress neural activity throughout the brain by facilitating inhibition, as documented by decades of previous research with this and related benzodiazepines, including dozens of studies of the effects of diazepam in the cerebellum. 

      To our knowledge, there is currently no drug that can specifically inhibit Purkinje cells, especially one that can be given systemically to cross the blood-brain barrier. Moreover, if such a drug did exist, we would not predict it to have the same effect as diazepam in reversing the learning deficits of the L7-Fmr1 KO mice, because the latter presumably depends on suppression of activity in the cerebellar granule cells and neurons of the inferior olive, whose axons form the parallel fibers and climbing fibers, and whose correlated activity controls LTD at the parallel fiber-Purkinje cell synapses.  

      We have revised the text to clarify the key point that despite its widespread action on the brain, the effects of diazepam on cerebellum-dependent learning were remarkably specific (lines 184-195, 210-228, 312318). During the period 18-24 hours after a single dose of diazepam, the learning deficits of L7-Fmr1 KO mice on two LTD-dependent oculomotor learning tasks were completely reversed, with no effects on the same tasks in WT mice, and no effects (“side-effects”) in L7-Fmr1 KO mice or WT mice on other, LTDindependent oculomotor learning tasks that depend on the same region of the cerebellum, and no effects on baseline performance of visually or vestibularly driven eye movements. 

      As described in the revised Discussion (lines 318-323), the non-specific mild suppression of neural activity throughout the brain by diazepam makes it a potentially generalizable approach for inducing BCM-like shifts in the threshold for associative plasticity to facilitate subsequent learning. More specifically, diazepam-mediated reduction of activity throughout the brain has the potential to lower any aberrantly high thresholds for associative plasticity at synapses throughout the brain, and thereby reverse any learning deficits associated with such aberrantly high plasticity thresholds. This approach might even be useful in cases where the neural circuitry supporting a given behavior is not well characterized and the specific synapses responsible for the learning deficit are unknown. On lines 323-327 we compare this generalizable approach with the challenges of designing task- and circuit-specific approaches to reset the threshold for plasticity, particularly in circuits that are less well characterized than the oculomotor circuit.

      It was not mentioned if L7-Fmr1 KO mice have behavior impairments that worsen with age or if Purkinje cells and the cerebellar microcircuit are intact throughout the lifespan. 

      At the adult ages used in our study (8-22 weeks), the oculomotor circuitry, including the Fmr1-deficient Purkinje cells, appears to be functionally intact because all of the oculomotor performance and learning tasks we tested were either normal, or could be restored to normal with brief behavioral and/or pharmacological pre-treatment.  

      Any degeneration of the Fmr1-deficient Purkinje cells or cerebellar microcircuit or additional behavioral impairments at older ages, if they should exist, would not alter our interpretation of the results from 8-22 week old adults regarding history- and activity-dependent changes in the capacity for LTD-dependent learning. Therefore, we leave the question of changes throughout the lifespan to investigators with an interest and expertise in development and/or aging. 

      Only a small handful of the scores of previous studies of the Fmr1 KO mouse model have investigated age-dependent effects; the reviewer may be interested in papers such as Tang et al., 2015 (doi: 10.1073/pnas.1502258112) or Martin et al., 2016 (doi: 10.1093/cercor/bhv031). 

      Connections between Purkinje cells and interneurons could also influence the behavior results found.

      This comment is repeated below in a more general form (Reviewer 1, second to last comment)—please see our response there and lines 270-309 of the revised manuscript for a discussion of how concerns about “off-target” effects are mitigated by the high degree of specificity of the learning deficits and effects of pre-training for the specific learning tasks in which LTD has been previously implicated, and the very similar findings in two different lines of mice with enhanced LTD.

      While males and females were both used for the current study, only 7 of each sex were analyzed, which could be underpowered. While it might be justified to combine sexes for this particular study, it would be worth understanding this model in more detail.

      We performed additional analyses to address the question of whether there might be sex differences that were not detected because of the sample size.

      (1) In a new figure, Fig. 1-figure supplement 1, we break out the results for male and female mice in separate plots, and show that all of the effects of both the KO of Fmr1 from the Purkinje cells and of pretreatment with diazepam that are observed in the full cohort are also statistically significant in just the subset of male mice, and just the subset of female mice (see Fig. 1-figure supplement 1 legend for statistics). In other words, qualitatively, there are no sex differences, and all of the conclusions of our manuscript are statistically valid in both male and female mice. This strengthens the justification for combining sexes for the specific scientific purposes of our study.  

      (2) We performed a power analysis to determine how many mice would be needed to determine whether the very, very small quantitative differences between male and female mice are significant. The analysis indicates that this would require upwards of 70 mice of each sex for WT mice (Cohen’s d, 0.6162; power

      0.95) and upwards of 2500 mice of each sex for L7-Fmr1 KO mice (Cohen’s d, 0.0989; power 0.95). Since the very small quantitative sex differences observed in our cohorts would not alter our scientific conclusions or the possibility for clinical application to patients of both sexes, even if the small quantitative differences turned out to be significant, the very large number of animals needed did not seem warranted for the current scientific purposes. Researchers focused on sex differences may find a motivation to pursue this issue further.   

      Training was only shown up to 30 minutes and learning did not seem to plateau in most cases. What would happen if training continued beyond the 30 minutes? Would L7-Fmr1 KO mice catch-up to WT littermates? Nguyen-Vu

      (1) For VOR learning, we used a 30 min training time because in our past (e.g., Boyden et al., 2003; Kimpo and Raymond, 2007; Nguyen-Vu et al., 2013; Nguyen-Vu et al., 2017) and current results, we find that VOR learning does plateau quite rapidly, with little or no additional adaptive change in the VOR observed between the tests of learning after 30 min vs 20 min of VOR-increase training, in WT or L7Fmr1 KO mice (Fig. 1A; WT, p=0.917; L7-Fmr1 KO, p=0.861; 20 vs. 30 min; Tukey). In the L7-Fmr1 KO mice, there is no significant high-frequency VOR-increase learning after 30 min training, and the mean VOR gain is even slightly lower on average (not significant) than before training (Fig. 1A, red). Therefore, we have no reason to expect that the L7-Fmr1 KO mice would catch up to WT after additional VOR-increase training.  

      (2) We have added new data on OKR adaptation, induced with 60 min of training (Fig. 5). The L7-Fmr1 KO mice exhibited impaired OKR adaptation, even with 60 min of training (p= 1.27x10-4, Tukey). In our experience, restraint for longer than 60 min produces a behavioral state that is not conducive to learning, as also reported by (Katoh and Yamagiwa, 2018), therefore longer training times were not attempted. 

      The pathway discussed as the main focus for VOR in this learning paradigm was connections between parallel fibers (PF) and Purkinje cells, but the possibility of other local or downstream circuitry being involved was not discussed. PF-Purkinje cell circuits were not directly analyzed, which makes this claim difficult to assess.

      In the revised manuscript (lines 299-309), we have expanded our discussion of the possibility that loss of expression of Fmr1 from Purkinje cells in the Purkinje cell-specific L7-Fmr1 KO mice might influence other synapses or intrinsic properties of the Purkinje cells (including synapses from interneurons, as raised in this reviewer’s comment above), in addition to enhancing associative LTD at the parallel fiberPurkinje cell synapses. 

      It is a very general limitation of all perturbation studies, even cell-type specific perturbation studies as in the current case, that it is never possible to completely rule out “off-target” effects of the manipulation. Because of this, causality cannot be definitively concluded from correlations (e.g., between the effects of a perturbation observed at the cellular and behavioral level), and therefore we make no such claim in our manuscript. Rather, we conclude that our results “provide evidence for,” “support,” “predict,” or “are consistent with” the hypothesis of a history- and activity-dependent change in the threshold for associative LTD at the parallel fiber-Purkinje cells.

      That said, perturbation is still one of the major tools in the experimental toolbox, and there are approaches for mitigating concern about off-target effects. We highlight three aspects of our experimental design that accomplish this (lines 184-228, 256-309). First, we show nearly identical learning impairments and effects of behavioral pretreatment in lines of mice with two completely different molecular manipulations that have the common effect of enhancing PF-Purkinje cell LTD, but are likely to have different off-target cellular effects on the Purkinje cells and their synapses. Second, we show that the learning impairments were highly specific to oculomotor learning tasks in which PF-Purkinje cell LTD was previously implicated, with no such effects on three other oculomotor learning tasks that depend on the same region of the cerebellum and oculomotor circuitry. In the original submission, we provided data for one LTDdependent oculomotor learning task, high-frequency VOR-increase learning; in the revised manuscript we provide new data for a second LTD-dependent oculomotor learning task, optokinetic reflex adaptation, with nearly identical results (Fig. 5). Third, we show that the effects of diazepam pre-treatment were highly specific to the same two LTD-dependent oculomotor learning tasks and also highly specific to the L7-Fmr1 KO mice with enhanced LTD and not WT mice. These three features of the experimental design are not common in studies of learning, especially in combination. On lines 256-309, we provide an expanded discussion of how together, these three features of the design strengthen the evidence that the learning impairments and effects of diazepam pre-treatment on learning are related to LTD at the PF-Pk synapses, while acknowledging the possibility of other effects on the circuit. 

      The authors mostly achieved their aim and the results support their conclusion and proposed hypothesis. This work will be impactful on the field as it uses a new Purkinje-cell specific mouse model to study a classic cerebellar task. The use of diazepam could be further analyzed in other genetic models of neurodevelopmental disorders to understand if effects on LTD can rescue other pathways and behavior outcomes.

      We agree that the present findings are potentially relevant for a very wide array of behavioral tasks, disease models, and brain areas beyond the specific ones in our study, and we make this point on lines 310-338 of the revised manuscript. 

      Reviewer #2 (Public Review):

      This manuscript explores the seemingly paradoxical observation that enhanced synaptic plasticity impairs (rather than enhances) certain forms of learning and memory. The central hypothesis is that such impairments arise due to saturation of synaptic plasticity, such that the synaptic plasticity required for learning can no longer be induced. A prior study provided evidence for this hypothesis using transgenic mice that lack major histocompatibility class 1 molecules and show enhanced long-term depression (LTD) at synapses between granule cells and Purkinje cells of the cerebellum. The study found that a form of LTD-dependent motor learning-increasing the gain of the vestibulo-ocular reflex (VOR)-is impaired in these mice and can be rescued by manipulations designed to "unsaturate" LTD. The present study extends this line of investigation to another transgenic mouse line with enhanced LTD, namely, mice with the Fragile X gene knocked out. The main findings are that VOR gain increased learning is selectively impaired in these mice but can be rescued by specific manipulations of visuomotor experience known to reverse cerebellar LTD. Additionally, the authors show that a transient global enhancement of neuronal inhibition also selectively rescues gain increases learning. This latter finding has potential clinical relevance since the drug used to boost inhibition, diazepam, is FDA-approved and commonly used in the clinic. The evidence provided for the saturation is somewhat indirect because directly measuring synaptic strength in vivo is technically difficult. Nevertheless, the experimental results are solid. In particular, the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable. The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this exceptionally clear and concise assessment of the findings and strengths of the manuscript.

      We agree that one of the most “remarkable” aspects of our findings is the specificity of the effects for oculomotor learning tasks for which there is the strongest previous evidence for a role of PF-Purkinje cell LTD. In the original manuscript, we tested just one LTD-dependent oculomotor learning task, highfrequency VOR increase learning; in the revised manuscript, we strengthen the case for LTD-dependent task specificity by adding new data (Fig. 5) showing the same effects for OKR adaptation, an additional LTD-dependent oculomotor learning task.

      The reviewer’s suggestion to include discussion of “untested assumptions”, “including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation” prompted us to more deeply consider the broader implications of our results, and extensively revise the Discussion accordingly. We clarify that we consider historydependent changes in the threshold for LTD to be a prediction of the behavioral and pharmacological findings (lines 339-347, 356) rather than an assumption. In addition, we highlight the broader implications of the results by putting them in the context of work in other brain areas on historydependent changes in the threshold for plasticity, i.e., metaplasticity, going back to the seminal Bienenstock-Cooper-Munro (BCM; year) theory (lines 348-378).  

      Reviewer #1 (Recommendations for The Authors):

      The text and figures are very clear to read, but there are a couple of questions that remain:

      The concentrations chosen for diazepam are not well described and it is unclear why the concentrations jump from 2.5 mg/kg to 0.5 mg/kg. Please add an explanation for these concentrations and if any additional behavior outcomes were observed.

      Our choice of diazepam concentrations was guided by the concentrations reported in the literature to be effective in mice, which suggest that a higher dose (2 mg/kg) can have additional effects not observed with a lower effective dose (0.5 mg/kg) (Pádua-Reis et al, 2021). Since we did not know how much enhancement of inhibition/suppression of activity might be necessary to substantially reduce the induction of PF-Purkinje cell LTD, we did pilot experiments to test concentrations at the low and high ends of the doses typically used in mice. These pilot experiments revealed that a lower dose of 0.4 or 0.5 mg/kg was comparable to the higher dose of 2.5 mg/kg in suppressing VOR-increase learning 2 hours after administration (Fig. 3 – figure supplement 2). Anecdotally, we observed higher levels of locomotor activity and other abnormal cage behavior during the period immediately after administration of the higher compared to the lower dose. To limit these side effects and any possibility of dependence, we used only the lower dose in all subsequent experiments. We clarify this rationale for using a lower dose in the legend of Fig. 3 – figure supplement 2.   

      Figure 4 describes low-frequency VOR, but the paragraph discussing these results (line 191) mentions high-frequency VOR-increase learning. It is unclear where the results are for the high-frequency data. Please include or rephrase for clearer understanding.

      In the revised manuscript, we clarify that the 1 Hz vestibular and visual stimuli used in Figs. 1-3 is the

      “high” frequency, which yields different results than the “low” frequency of 0.5 Hz (Fig. 4), as also observed in Boyden et al 2006, and Nguyen-Vu et al, 2017. 

      Reviewer #2 (Recommendations For The Authors):

      The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this comment, which, along with your public comments, inspired us to thoroughly reconsider and revise our Discussion. We think this has greatly improved the manuscript, and will substantially increase its appeal to a broad segment of the neuroscience research community, including computational neuroscientists as well as those interested in synaptic physiology, learning and memory, or plasticity-related brain disorders including autism. 

      Note that we consider the idea that ”LTD depends not only on pre- and post- synaptic activity but also on the prior history of synaptic activation” to be the central prediction of the threshold metaplasticity hypothesis rather than an assumption, and in the revised manuscript we explicitly refer to this as a prediction (line 339, 356).  We also added a discussion of multiple known cellular phenomena in the Purkinje cells and their synapses that can regulate LTD and thus represent candidate mechanisms for LTD threshold metaplasticity (lines 339-347). Again, sincere thanks for prompting us to write a vastly improved Discussion section.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported in the main text for all key questions and not only when the p-value is less than 0.05.

      We have added exact p-values throughout the manuscript.  

      References

      Albergaria C, Silva NT, Pritchett DL, Carey MR. (2018). Locomotor activity modulates associative learning in mouse cerebellum. Nat Neurosci.21:725-735. doi: 10.1038/s41593-018-0129-x.

      Abraham WC, Mason-Parker SE, Bear MF, Tate WT. (2001). Heterosynaptic metaplasticity in the hippocampus in vivo: A BCM-like modifiable threshold for LTP. Proc Natl Acad Sci USA. 98:1092410929.

      Bienenstock E, Cooper L, Munro P. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci. 2:32-48. https://doi.org/10.1523/JNEUROSCI.02-01-00032.1982

      Brett J, Murnion B. (2015). Management of benzodiazepine misuse and dependence. Aust Prescr.38:152155. doi: 10.18773/austprescr.055.

      Boyden ES, Raymond JL. (2003). Active Reversal of Motor Memories Reveals Rules Governing Memory Encoding. Neuron.39:1031-1042. https://doi.org/10.1016/S0896-6273(03)00562-2

      Boyden ES, Katoh A, Pyle JL, Chatila TA, Tsien RW, Raymond JL. (2006). Selective engagement of plasticity mechanisms for motor memory storage. Neuron. 51:823-834. https://doi.org/10.1016/j.neuron.2006.08.026

      Desai NS, Cudmore RH, Nelson SB, Turrigiano GG. (2002). Critical periods for experience-dependent synaptic scaling in visual cortex. Nat Neurosci. 5:783-789. doi: 10.1038/nn878.

      Fong M, Duffy KR, Leet MP, Candler CT, Bear MF. (2021). Correction of amblyopia in cats and mice after the critical period. ELife.10:e70023. https://doi.org/10.7554/eLife.70023

      Hamada M, Terao Y, Hanajima R, Shirota Y, Nakatani-Enomoto S, Furubayashi T, Matsumoto H, Ugawa Y. (2008). Bidirectional long-term motor cortical plasticity and metaplasticity induced by quadripulse transcranial magnetic stimulation. J Physiol. 586:3927-3947. doi: 10.1113/jphysiol.2008.152793.

      Katoh A, Yamagiwa A. (2018). Inhibition of PVN neurons influences stress-induced changes of motor learning in the VOR. Society for Neuroscience. Online Program No. 067.14.

      Kimpo RR, Raymond JL. (2007). Impaired motor learning in the vestibulo-ocular reflex in mice with multiple climbing fiber input to cerebellar Purkinje cells. J Neurosci. 27:5672-5682. doi:

      10.1523/JNEUROSCI.0801-07.2007.

      Kirkwood A, Rioult MG, Bear MF. (1996). Experience-dependent modification of synaptic plasticity in visual cortex. Nature. 381:526–528. https://doi.org/10.1038/381526a0

      Koekkoek SK, Yamaguchi K, Milojkovic BA, Dortland BR, Ruigrok TJ, Maex R, De Graaf W, Smit AE, VanderWerf F, Bakker CE, Willemsen R, Ikeda T, Kakizawa S, Onodera K, Nelson DL, Mientjes E, Joosten M, De Schutter E, Oostra BA, Ito M, De Zeeuw CI. (2005). Deletion of FMR1 in Purkinje Cells Enhances Parallel Fiber LTD, Enlarges Spines, and Attenuates Cerebellar Eyelid Conditioning in Fragile X Syndrome. Neuron. 47:339–352. https://doi.org/10.1016/j.neuron.2005.07.005

      Le Friec A, Salabert AS, Davoust C, Demain B, Vieu C, Vaysse L, Payoux P, Loubinoux I. (2017). Enhancing Plasticity of the Central Nervous System: Drugs, Stem Cell Therapy, and Neuro-Implants. Neural Plast. 2017:2545736. doi: 10.1155/2017/2545736.

      Leet MP, Bear MF, Gaier ED. (2022). Metaplasticity: a key to visual recovery from amblyopia in adulthood? Curr Opin Ophthalmol. 33:512–518. https://doi.org/10.1097/ICU.0000000000000901

      Martin HGS, Lassalle O, Brown JT, Manzoni OJ. (2016). Age-Dependent Long-Term Potentiation Deficits in the Prefrontal Cortex of the Fmr1 Knockout Mouse Model of Fragile X Syndrome. Cereb Cortex. 26:2084–2092. doi: 10.1093/cercor/bhv031.

      Montgomery JM, Madison DV. (2002). State-dependent heterogeneity in synaptic depression between pyramidal cell pairs. Neuron. 33:765-777. doi: 10.1016/s0896-6273(02)00606-2.

      Nguyen-Vu TDB, Kimpo RR, Rinaldi JM, Kohli A, Zeng H, Deisseroth K, Raymond JL. (2013). Cerebellar Purkinje cell activity drives motor learning. Nat Neurosci. 16:1734-1736. doi:

      10.1038/nn.3576.

      Nguyen-Vu TB, Zhao GQ, Lahiri S, Kimpo RR, Lee H, Ganguli S, Shatz CJ, Raymond JL. (2017). A saturation hypothesis to explain both enhanced and impaired learning with enhanced plasticity. ELife. 6:e20147. https://doi.org/10.7554/eLife.20147

      Pádua-Reis M, Nôga DA, Tort ABL, Blunder M. (2021). Diazepam causes sedative rather than anxiolytic effects in C57BL/6J mice. Sci Rep. 2021;11:9335.

      Singh A, Nagpal R, Mittal SK, Bahuguna C, Kumar P. (2017). Pharmacological therapy for amblyopia. Taiwan J Ophthalmol. 7:62-69. doi: 10.4103/tjo.tjo_8_17.

      Tang B, Wang T, Wan H, Han L, Qin X, Zhang Y, Wang J, Yu C, Berton F, Francesconi W, Yates JR 3rd, Vanderklish PW, Liao L. (2015). Fmr1 deficiency promotes age-dependent alterations in the cortical synaptic proteome. Proc Natl Acad Sci USA. 112:E4697-E4706. doi: 10.1073/pnas.1502258112.

      Yamaguchi T, Moriya K, Tanabe S, Kondo K, Otaka Y, Tanaka S. (2020). Transcranial direct-current stimulation combined with attention increases cortical excitability and improves motor learning in healthy volunteers. J Neuroeng Rehabil. 17:23. doi: 10.1186/s12984-020-00665-7.

    1. eLife assessment

      This study conducted fMRI experiments in an inbred rat model of absence seizures. The results provide new information suggesting reduced brain responsiveness during this type of seizure. The reviewers had divergent opinions but on average thought the study was valuable and the conclusions were solid.

    2. Reviewer #1 (Public Review):

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      The authors have revised this paper with a lot of detail.

    3. Reviewer #2 (Public Review):

      Summary:

      This study examined the possible effect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, there is also difficulty knowing the effect of the stimulus, SWD and stimulus + SWD.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data. The authors acknowledge this, but it does lessen its significance.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is with a second model rather than empirical data.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      The paper has been improved by revisions but there are still parts that are unclear, as described below.

    4. Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. However the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with potential to yield important insights.

      Use of an awake, habituated model is a valid and potentially powerful approach.

      The major difficulty with interpreting the results of this study is that the duration of the visual and tactile stimuli were 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. However the attempts to localize these differences in space or time will be contaminated by the seizure related signals.

      In their repeated responses to this comment the authors have stated that some seizures had longer than average duration, and that they have attempted to model the effects of both seizures and sensory stimulation. However these factors do not mitigate the concern because the mean duration of seizures and sensory stimulation remain nearly identical, and the models used therefore will not be able to effectively separate signals related to seizures and related to sensory stimulation. Hemodynamic models can never in reality represent underlying signals in an orthogonal manner, and are only indirectly related to neural activity.

      The only way to truly address the important weakness of this study would be to repeat the experiments using stimulus durations that do not match mean seizure duration, e.g. with much shorter duration stimuli.

      The authors have clarified and improved the figure images and their description in the text based on previous specific comments. However, the main weakness in the results remains as summarized above.

      Minor comments:

      Aside from the concerns listed as weaknesses above which were not addressed, most of the more minor comments were addressed by the authors in the resubmissions. However, the comment made twice previously regarding Figure 6-figure supplement 1 was not addressed. It remains impossible to see any firing rate changes elicited by sensory stimuli during the ictal period in parts E and F of the figure vs. parts B and C (interictal), due to the very different scales used. The seizure signals should be removed or accounted for by the model so that any possible sensory stimulus-related signals could be seen, and/or displayed on the same scale as firing rates without seizures. The authors have simply restated their opinion that it is better to include the SWD dynamics in these figures parts, however this makes the figure wholly unconvincing. It is also concerning that part D (ictal), which is in fact shown on the same scale as part A (interictal), actually shows larger firing rates for both excitatory and inhibitory neurons in visual cortex for sensory stimulation during seizures. This contradicts the claims in the rest of the paper that neural activity and fMRI signals are smaller or are even decreased in visual cortex with sensory stimulation during seizures compared to the interictal period.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable work performed fMRI experiments in a rodent model of absence seizures. The results provide new information regarding the brain's responsiveness to environmental stimuli during absence seizures. The authors suggest reduced responsiveness occurs during this type of seizure, and the evidence leading to the conclusion is solid, although reviewers had divergent opinions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible affect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion. I do not see that this issue was addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time. This issue was not addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data. This is still an issue. No conclusions appear to be possible to make.

      See comments below starting with “We acknowledge the reviewer…”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The authors did not add any validation of their model.

      See comments below starting with “We acknowledge the reviewer…”.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results were improved but some are still are unclear.

      We acknowledge the reviewer for the concerns of we not addressing the comments above. However, we emphasize that most of the comments were addressed in the already sent “Response to Review Comments” and in the updated manuscript. Here we repeat the responses and provide also additional clarifications to some of the comments.

      We thank the reviewer for noting the discrepancy in the statement of “less activated in interictal state”. The statement should have been written vice versa. We also address that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made a following changes in the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”. The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression (possible apparent in negative HRFs) caused by SWD can prevent responsiveness. Conclusion now states the following: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response.”

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. But the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with potential to yield important insights.

      Use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and tactile stimuli were 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. But the attempts to localize these differences in space or time will be contaminated by the seizure related signals.

      In their response to this comment the authors state that some seizures had longer than average duration, and that they attempted to model the effects of both seizures and sensory stimulation. However these factors do not mitigate the concern because the mean duration of seizures and sensory stimulation remain nearly identical, and the models used therefore will not be able to effectively separate signals related to seizures and related to sensory stimulation.

      Regressors for seizures were formed by including periods of seizures without any stimulation present. In theory, if seizures were perfectly modeled by the regressor, the left variance is completely orthogonal to the main effect of the stimulus. Furthermore, only the cases where the seizures are longer than the stimulus are used to calculate the responsiveness of the stimulus (while the cases where the seizures are shorter than the stimulus are used as nuisance regressors to account for error variance). However, we agree with the reviewer that in practice all effects of the seizure cannot be removed completely from the effect of stimulus. We have addressed this concern in the “physiologic and methodology consideration” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain a mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination in the linear model used, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be separated as much as possible from the effects caused by the seizure itself.”

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs interictal remain unconvincing due to above.

      Maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      In their response the authors enlarged the cross sections. However there are still discrepancies between the images and the way they are described in the text. For example, in the Results text the authors say that comparing the interictal and ictal states revealed less activation in the somatosensory cortex during the ictal than during the interictal state, yet Figure 3 bottom row left shows greater activation in somatosensory cortex in this contrast.

      We note that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made the following changes to the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Authors have revised this paper with a lot of detail. The paper can be accepted for publication in this version.

      Reviewer #2 (Recommendations For The Authors):

      Reviewer #1

      (1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: "The mechanism underlying the reduced responsiveness to external stimulus remains unknown." was therefore modified to the following "The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown".

      This change did not address the issue. The problem is that there is no experimentation to address the underlying mechanisms of the results. I also think the changed language in the abstract is less clear than the original.

      We fully agree that this manuscript does not answer or claim to be answering about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states, by means of hemodynamics and mean-field simulation.

      We have changed the language of the abstract to the following:

      “In patients suffering absence epilepsy, recurring seizures can significantly decrease their quality of life and lead to yet untreatable comorbidities. Absence seizures are characterized by spike-and-wave discharges on the electroencephalogram associated with a transient alteration of consciousness. However, it is still unknown how the brain responds to external stimuli during and outside of seizures.

      This study aimed to investigate responsiveness to visual and somatosensory stimulation in GAERS, a well-established rat model for absence epilepsy. Animals were maintained in a non-curarized awake state allowing for naturally occurring seizures to be produced inside the magnet. They were imaged continuously using a quiet zero-echo-time functional magnetic resonance imaging (fMRI) sequence. Sensory stimulations were applied during interictal and ictal periods. Whole brain responsiveness and hemodynamic responses were compared between these two states. Additionally, a mean-field simulation model was used to mechanistically explain the changes of neural responsiveness to visual stimulation between interictal and ictal states.

      Results showed that, during a seizure, whole-brain responses to both sensory stimulations were suppressed and spatially hindered. In several cortical regions, hemodynamic responses were negatively polarized during seizures, despite the application of a stimulus. The simulation experiments also showed restricted propagation of spontaneous activity due to stimulation and so agreed well with fMRI findings. These results suggest that sensory processing observed during an interictal state is hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness during this absence epileptic process.”

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      The response of the authors did not clarify this issue. Instead, they explained why they examined HRF and that they can only speculate what the data means.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      This is not entirely satisfactory because there is still no validation of the model.

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following:<br /> "Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps."

      This is helpful, but the unstained brain does not show the borders of the areas. Therefore just saying an atlas was used is not enough. How in an unstained brain can the areas be accurately outlined?

      Areas of the brain were differentiated by co-registering the functional MRI images with an T1-weighted anatomical reference brain that was created on site from the same data set that was used for the manuscript. Potential co-registration inaccuracies created by using a reference brain measured in different site, sequence and a rat strain can be thus avoided. T1-images create sufficient contrast to differentiate main brain areas, but for more accurate border definition (e.g., to differentiate different thalamic nuclei), a coordinate system of the atlas and coordinates known in the used anatomical brain, were used to pinpoint exact borders of the brain areas.

      Reviewer #2

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)."

      What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Please check that these issues were corrected.

      The issues were addressed as follows:

      “Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al., 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      This is not the question. The question is whether there was sufficient power.

      A simple power calculation was performed as follows: considering a t-test, a risk alpha of 0.05, a power of 0.8, matched pairs (seizure/control), we can detect an effect size of 0.37 with our 4 animals, considering repeated measurements (4 sessions/animal x 11 seizure/control pairs per session). This is now mentioned in the manuscript.

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Table 1 could be improved by statistics. More could be said and there would be justification to include it.

      We thank the reviewer for the suggestion, but as it is yet unclear to what statistical comparison would be feasible to do, we opt to leave it out.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      This section is not clear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themselves with the concept of statistical parametric mapping.

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      This section is unclear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themself with the concept of statistical parametric mapping.

      Reviewer #3 (Recommendations For The Authors):

      Aside from the concerns listed as weaknesses above which were not addressed, most of the more minor comments were addressed by the authors in the resubmission. However, the comment below was not addressed because it is impossible to see any firing rate changes elicited by sensory stimuli (if they are present) due to the scale during seizures. The seizure signals should be removed or accounted for by the model so that any possible sensory stimulus-related signals could be seen, and displayed on the same scale as firing rates without seizures. Prior comment (unaddressed) is repeated below:

      Figure 6-figure supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest, and perhaps should be subtracted out.

      These two comments were addressed and replied in the previous round of reviews. Regarding the different scales of the plots from Figure 6-figure supplement 1, we point out that all the plots in the same scale are already presented in Figure 6 of the main-text. Regarding the activity related to SWD and sensory stimulation, we remark that the effect of the stimulation should be (and was) evaluated with respect to the ongoing activity. All the results concerning the neuronal responsiveness presented in the paper evaluate the statistical significance of the changes in activity produced by the stimulation with respect to the ongoing activity (during ictal and interictal states respectively). For this reason, all the plots containing the time series of neuronal activity in the simulations include the ongoing activity (with SWD dynamics when present) for proper comparison and relevant analysis. 

      Additional changes:

      In the section 3.2., the sentence: “In addition, responses were observed in the somatosensory cortex during a seizure state.” was removed for clarification purposes as deactivation rather than activation was observed in this brain area during a seizure state.

    1. eLife assessment

      In this manuscript, the authors tested the hypothesis that Aβ42 toxicity arises from its proven affinity for γ-secretases. The authors provide useful findings, showing convincingly that human Abeta42 inhibits gamma-secretase activity. The data will be of interest to all scientists working on neurodegenerative diseases.

    2. Reviewer #1 (Public Review):

      Summary:

      Human Abeta42 inhibits gamma-secretase activity in biochemical assays.

      Strengths:

      Determination of inhibitory concentration human Abeta42 on gamma-secretase activity in biochemical assays.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) It is not clear about the biological significance of the inhibitory effects of human Abeta42 on gammasecretase activity. As the authors mentioned in the Discussion, it is plausible that Abeta42 may concentrate up to microM level in endosomes. However, subsets of FAD mutations in APP and presenilin 1 and 2 increase Abeta42/Abeta40 ratio and lead to Abeta42 deposition in brain. APP knock-in mice NLF and NLGF also develop Abeta42 deposition in age-dependent manner, although they produce more human Abeta42 than human Abeta40. 

      If the production of Abeta42 is attenuated, which results in less Abeta42 deposition in brain. So, it is unlikely that human Abeta42 interferes gamma-secretase activity in physiological conditions. This reviewer has an impression that inhibition of gamma-secretase by human Abeta42 is an interesting artifact in high Abeta42 concentration. If the authors disagree with this reviewer's comment, this manuscript needs more discussion in this point of view. 

      We thank the Reviewer for raising this key conceptual point, we acknowledge that it was insufficiently discussed in the original manuscript. In response to this point, we introduced the following paragraph in the discussion section of the revised manuscript:

      “From a mechanistic standpoint, the competitive nature of the Aβ42-mediated inhibition implies

      that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the endogenous substrates (Figure 10C and 10D). The model that we put forward is that cellular uptake, as well as endosomal production of Aβ, result in increased intracellular concentration of Aβ42, facilitating γ-secretase inhibition and leading to the buildup of APP-CTFs (and γ-secretase substrates in general). As Aβ42 levels fall, the augmented concentration of substrates shifts the equilibrium towards their processing and subsequent Aβ production. As Aβ42 levels rise again, the equilibrium is shifted back towards inhibition. This cyclic inhibitory mechanism will translate into pulses of (partial) γsecretase inhibition, which will alter γ-secretase mediated-signaling (arising from increased CTF levels at the membrane or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signaling, implicated in memory formation, and potentially others (related to e.g. cadherins, p75 or neuregulins). It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor semagacestat have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (7) and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (85).

      The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γsecretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis.“

      We have also added figures 10C and 10D, presented here for convenience.

      Author response image 1.

      (2) It is not clear whether the FRET-based assay in living cells really reflects gamma-secretase activity.

      This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta. 

      We have established a novel, HiBiT tag based assay reporting on the global γ-secretase activity in cells, using as a proxy the total levels of secreted HiBiT-tagged Aβ peptides. The assay and findings are presented in the revised manuscript as follows:

      In the result section, in the “Aβ42 treatment leads to the accumulation of APP C-terminal fragments in neuronal cell lines and human neuron” subsection:

      “The increments in the APP-CTF/FL ratio suggested that Aβ42 (partially) inhibits the global γ-

      secretase activity. To further investigate this, we measured the direct products of the γ-secretase mediated proteolysis of APP. Since the detection of the endogenous Aβ products via standard ELISA methods was precluded by the presence of exogenous human Aβ42 (treatment), we used an N-terminally tagged version of APPC99 and quantified the amount of total secreted Aβ, which is a proxy for the global γsecretase activity. Briefly, we overexpressed human APPC99 N-terminally tagged with a short 11 amino acid long HiBiT tag in human embryonic kidney (HEK) cells, treated these cultures with human Aβ42 or p3 17-42 peptides at 1 μM or DAPT (GSI) at 10 µM, and determined total HiBiT-Aβ levels in conditioned media (CM). DAPT was considered to result in full γ-secretase inhibition, and hence the values recorded in DAPT treated conditions were used for the background subtraction. We found a ~50% reduction in luminescence signal, directly linked to HiBiT-Aβ levels, in CM of cells treated with human Aβ42 and no effect of p3 peptide treatment, relative to the DMSO control (Figure 3D). The observed reduction in the total Aβ products is consistent with the partial inhibition of γ -secretase by Aβ42.”

      In Methods:

      “Analysis of γ-secretase substrate proteolysis in cultured cells using secreted HiBiT-Aβ or -Aβ-like peptide levels as a proxy for the global γ-secretase endopeptidase activity

      HEK293 stably expressing APP-CTF (C99) or a NOTCH1-based substrate (similar in size as

      APP- C99) both N-terminally tagged with the HiBiT tag were plated at the density of 10000 cells per 96-well, and 24h after plating treated with Aβ or p3 peptides diluted in OPTIMEM (Thermo Fisher Scientific) supplemented with 5% FBS (Gibco). Conditioned media was collected and subjected to analysis using Nano-Glo® HiBiT Extracellular Detection System (Promega). Briefly, 50 µl of the medium was mixed with 50 µl of the reaction mixture containing LgBiT Protein (1:100) and Nano-Glo HiBiT Extracellular Substrate (1:50) in Nano-Glo HiBiT Extracellular Buffer, and the reaction was incubated for 10 minutes at room temperature. Luminescence signal corresponding to the amount of the extracellular HiBiT-Aβ or -Aβ-like peptides was measured using victor plate reader with default luminescence measurement settings.”

      As the direct substrate of γ -secretase was used in this analysis, the observed reduction (~50%) in the levels of N-terminally-tagged (HiBiT) Aβ peptides in the presence of 1 µM Aβ42, relative to control conditions, demonstrates a selective inhibition of γ-secretase by Aβ42 (not by the p3). These data complement the FRET-based findings presented in Figure 5.

      (3) Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta in Figures 4, 5 and 7.

      We tried to measure the levels of Aβ peptides secreted by cells into the culture medium directly by ELISA (using different protocols) or MS (using established methods, as reported in Koch et al, 2023), but exogenous Aβ42 (treatment) present at relatively high levels interfered with the readout and rendered the analysis inconclusive. 

      However, we were successful in the determination of total secreted (HiBiT-tagged) Aβ peptides from the HiBiT tagged APP-C99 substrate, as indicated in the previous point. The quantification of the levels of these peptides showed that Aβ42 treatment resulted in ~50% reduction in the γ -secretase mediated processing of the tagged substrate.    

      In addition, we would like to highlight that our analysis of the contribution of other APP-CTF degradation pathways, using cycloheximide-based assays in the constant presence of γ-secretase inhibitor, failed to reveal significant differences between Aβ42 treated cells and controls (Figure 6B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γsecretase inhibition maintained by inhibitor treatment is consistent with the proposed Aβ42-mediated inhibitory mechanism.

      (4) Similar to comment #3. Processing of Pancad-CTF and p75 in living cells may be not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of ICDs in Figures 6C and E. 

      To address this comment we have now performed additional experiments where we measured Nterminal Aβ-like peptides derived from NOTCH1-based substrate using the HiBiT-based assay. These experiments showed a reduction in the aforementioned peptides in the cells treated with Aβ42 relative to the vehicle control, and hence further confirmed the inhibitory action of Aβ42. These new data have been included as Figure 8D in the revised manuscript and described as follow:

      Finally, we measured the direct N-terminal products generated by γ-secretase proteolysis from a HiBiT-tagged NOTCH1-based substrate, an estimate of the global γ-secretase activity. We quantified the Aβ-like peptides secreted by HEK 293 cells stably expressing this HiBiT-tagged substrate upon treatment with 1 µM Aβ1-42,  p3 17-42 peptide or  DAPT (GSI) (Figure 8D). DAPT treatment was considered to result in a complete γ-secretase inhibition, and hence the values recorded in the DAPT condition were used for background subtraction. A ~20% significant reduction in the amount of secreted

      N-terminal HiBiT-tagged peptides derived from the NOTCH1-based substrates in cells treated with Aβ1-

      42 supports the inhibitory action of Aβ1-42 on γ-secretase mediated proteolysis.

      Minor concerns:

      (1) Murine Abeta42 may be converted to murine Abeta38 easily, compared to human Abeta42. This may be a reason why murine Abeta42 exhibits no inhibitory effect on gamma-secretase activity. 

      In order to address this question, we performed additional experiments where we assessed the processing of murine Aβ42 into Aβ38. Analogous to human Aβ42, the murine Aβ42 peptide was not processed to Aβ38 in the assay conditions. These new data have been integrated in the manuscript and added as a Supplementary figure 1B.

      (2) It is curious to know the levels of C99 and C83 in cells in supplementary figure 3.  

      The conditions used in these assays were analogous to the conditions used in the figure 3 (i.e. treatment with Aβ peptides at 1 µM concentrations). Such conditions were associated with profound and consistent APP-CTF accumulation in this model system.

      Reviewer #2 (Recommendations For The Authors):

      In the current study, the authors show that Aβs with low affinity for γ-secretase, but when present at relatively high concentrations, can compete with the longer, higher affinity APPC99 substrate for binding and processing. They also performed kinetic analyses and demonstrate that human Aβ1-42 inhibits γ-secretase-mediated processing of APP C99 and other substrates. Interestingly, neither murine Aβ1-42 nor human p3 (17-42 amino acids in Aβ) peptides exerted inhibition under similar conditions. The authors also show that human Aβ1-42-mediated inhibition of γ-secretase activity results in the accumulation of unprocessed, which leads to p75-dependent activation of caspase 3 in basal forebrain cholinergic neurons (BFCNs) and PC12 cells. 

      These analyses demonstrate that, as seen for γ-secretase inhibitors, Aβ1-42 potentiates this marker of apoptosis. However, these are no any in vivo data to support the physiological significance of the current finding. The author should show in APP KO mice whether gamma-secretase enzymatic activity is elevated or not, and putting back Aβ42 peptide will abolish these in vivo effects. 

      The findings presented in this manuscript form the basis for further in vitro and in vivo research to investigate the mechanisms of inhibition and its contribution to brain pathophysiology. Here, we used well-controlled model systems to investigate a novel mechanism of Aβ42 toxicity. Multiple mechanisms regulate the local concentration of Aβ42 in vivo, making the dissection of the biochemical mechanisms of the inhibition more complex. Nevertheless, beyond the scope of this report, we consider these very reasonable comments as a motivation for further research activities. 

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we have also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 7). Treatment with this conditioned medium  led to the increase APP-CTF levels, supporting  that low nM concentrations of Aβ are sufficient for partial inhibition of  γ-secretase. 

      In addition, we highlight that analyses of the brains of the AD affected individuals have shown that APPCTFs accumulate in both sporadic and genetic forms of the disease (Pera et al. 2013, Vaillant-Beuchot et al. 2021); and recently, Ferrer-Raventós et al. 2023 have revealed a correlation between APP-CTFs and Aβ levels at the synapse (Ferrer-Raventós et al. 2023). We therefore assessed the concentration of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals. Our findings and conclusions are included in the revised version as follows: 

      In the results section:

      “We next investigated the levels of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals (Figure 10B). Towards this, we prepared synaptosomes from frozen brain tissues using Percoll gradient procedure (62, 63). Intact synaptosomes were spun to obtain a pellet which was resuspended in minimum amount of PBS, allowing us to estimate the volume containing the resuspended synaptosome sample. This is likely an overestimate of the actual synaptosome volume. Finally, synaptosomes were lysed in RIPA buffer and Aβ peptide concentrations measured using ELISA (MSD). We observed that the concentration of Aβ42 in the synaptosomes from (end-stage) AD tissues was significantly higher (10.7 nM)  than those isolated from non-demented tissues (0.7 nM), p<0.0005***. These data provide evidence for accumulation at nM concentrations of endogenous Aβ42 in synaptosomes in end-stage AD brains. Given that we measured Aβ42 concentration in synaptosomes, we speculate that even higher concentrations of this peptide may be present in the endolysosome vesicle system, and therein inhibit the endogenous processing of APP-CTF at the synapse. Of note treatment of PC12 cells with conditioned medium containing even lower amounts of Aβ (low nanomolar range (0.5-1 nM)) resulted in the accumulation of APP-CTFs.” 

      In the discussion: 

      “The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded by a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γ-secretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis. ”

    1. eLife assessment

      This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is solid. The study will be of interest to researchers working on the development and control of attention.

    2. Reviewer #1 (Public Review):

      Summary:

      The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.

      Strengths:

      I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable.

      Weaknesses:

      The levels of EEG noise across age groups and periods of attention allocation are not controlled for. I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper.

      Concerning cross-correlation analyses, the authors state that "Interpreting the exact time intervals over which a cross-correlation is significant is challenging". Then, they say that asymmetry is enough to conclude that attention forward predicted theta power more than vice versa. I think it could be useful to add a bit more of explanation before reaching this conclusion, explaining why such statement is correct, and how it is supported by previous work in statistics.

      Finally, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are not fully distinguished, but conflated instead (e.g., "attention durations"). This does not impact the quality of the work or analyses, but it slightly reduces clarity.

      General Remarks<br /> In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.

      Strengths:

      The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, I think this article's findings have important theoretical implications for the development of infant attention.

      Weaknesses:

      The authors have effectively tackled the majority of my concerns within their revised manuscript, resulting in a substantial improvement. While the revised paper notably addresses many points, one question regarding the potential contamination of saccades on EEG power remains partially unresolved. However, I appreciate the authors' explanation that resolving this issue was challenging due to the absence of eye-tracking data in the current study. Additionally, I acknowledge their inclusion of this concern in the limitations section.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is currently incomplete. The study will be of interest to researchers working on the development and control of attention.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants be free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.

      Strengths:

      I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable. However, I have a few major concerns that I would like the authors to address, especially on the methodological side.

      Points of improvement

      (1) Noise

      The first concern is the level of noise across age groups, periods of attention allocation, and metrics. Starting with EEG, I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper. 

      We thank the reviewer for this comment. We certainly have evidence that even the most state-of-the-art cleaning procedures (such as machine-learning trained ICA decompositions, as we applied here) are unable to remove eye movement artifact entirely from EEG data (Haresign et al., 2021; Phillips et al., 2023). (This applies to our data but also to others’ where confounding effects of eye movements are generally not considered.) Importantly, however, our analyses have been designed very carefully with this explicit challenge in mind. All of our analyses compare changes in the relationship between brain activity and attention as a function of age, and there is no evidence to suggest that different sources of noise (e.g. crying vs. movement) would associate differently with attention durations nor change their interactions with attention over developmental time. And figures 5 and 7, for example, both look at the relationship of EEG data at one moment in time to a child’s attention patterns hundreds or thousands of milliseconds before and after that moment, for which there is no possibility that head or eye movement artifact can have systematically influenced the results.

      Moving onto the video coding, I see that inter-rater reliability was not very high. Is this due to the fine-grained nature of the coding (20ms)? Is it driven by differences in expertise among the two coders? Or because coding this fine-grained behaviour from video data is simply too difficult? The main dependent variable (looking duration) is extracted from the video coding, and I think the authors should be confident they are maximising measurement accuracy.

      We appreciate the concern. To calculate IRR we used this function (Cardillo G. (2007) Cohen's kappa: compute the Cohen's kappa ratio on a square matrix. http://www.mathworks.com/matlabcentral/fileexchange/15365). Our “Observed agreement” was 0.7 (std= 0.15). However, we decided to report the Cohen's kappa coefficient, which is generally thought to be a more robust measure as it takes into account the agreement occurring by chance. We conducted the training meticulously (refer to response to Q6, R3), and we have confidence that our coders performed to the best of their abilities.

      (2) Cross-correlation analyses

      I would like to raise two issues here. The first is the potential problem of using auto-correlated variables as input for cross-correlations. I am not sure whether theta power was significantly autocorrelated. If it is, could it explain the cross-correlation result? The fact that the cross-correlation plots in Figure 6 peak at zero, and are significant (but lower) around zero, makes me think that it could be a consequence of periods around zero being autocorrelated. Relatedly: how does the fact that the significant lag includes zero, and a bit before, affect the interpretation of this effect? 

      Just to clarify this analysis, we did include a plot showing autocorrelation of theta activity in the original submission (Figs 7A and 7B in the revised paper). These indicate that theta shows little to no autocorrelation. And we can see no way in which this might have influenced our results. From their comments, the reviewer seems rather to be thinking of phasic changes in the autocorrelation, and whether the possibility that greater stability in theta during the time period around looks might have caused the cross-correlation result shown in 7E. Again though we can see no way in which this might be true, as the cross-correlation indicates that greater theta power is associated with a greater likelihood of looking, and this would not have been affected by changes in the autocorrelation.

      A second issue with the cross-correlation analyses is the coding of the looking behaviour. If I understand correctly, if an infant looked for a full second at the same object, they would get a maximum score (e.g., 1) while if they looked at 500ms at the object and 500ms away from the object, they would receive a score of e.g., 0.5. However, if they looked at one object for 500ms and another object for 500ms, they would receive a maximum score (e.g., 1). The reason seems unclear to me because these are different attention episodes, but they would be treated as one. In addition, the authors also show that within an attentional episode theta power changes (for 10mos). What is the reason behind this scoring system? Wouldn't it be better to adjust by the number of attention switches, e.g., with the formula: looking-time/(1+N_switches), so that if infants looked for a full second, but made 1 switch from one object to the other, the score would be .5, thus reflecting that attention was terminated within that episode? 

      We appreciate this suggestion. This is something we did not consider, and we thank the reviewer for raising it. In response to their comment, we have now rerun the analyses using the new measure (looking-time/(1+N_switches), and we are reassured to find that the results remain highly consistent. Please see Author response image 1 below where you can see the original results in orange and the new measure in blue at 5 and 10 months.

      Author response image 1.

      (3) Clearer definitions of variables, constructs, and visualisations

      The second issue is the overall clarity and systematicity of the paper. The concept of attention appears with many different names. Only in the abstract, it is described as attention control, attentional behaviours, attentiveness, attention durations, attention shifts and attention episode. More names are used elsewhere in the paper. Although some of them are indeed meant to describe different aspects, others are overlapping. As a consequence, the main results also become more difficult to grasp. For example, it is stated that autonomic arousal predicts attention, but it's harder to understand what specific aspect (duration of looking, disengagement, etc.) it is predictive of. Relatedly, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are used interchangeably. I would want to see more demarcation between different concepts and between concepts and measurements.

      We appreciate the comment and we have clarified the concepts and their operationalisation throughout the revised manuscript.

      General Remarks

      In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.

      We thank the reviewer for the close attention that they have paid to our manuscript, and for their insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.

      Strengths:

      The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, the findings have important theoretical implications for the development of infant attention.

      Weaknesses:

      Certain methodological procedures require further clarification, e.g., details on EEG data processing. Additionally, it would be beneficial to eliminate possible confounding factors and consider alternative interpretations, e,g., whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during the free play.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #3 (Public Review):

      Summary:

      Much of the literature on attention has focused on static, non-contingent stimuli that can be easily controlled and replicated--a mismatch with the actual day-to-day deployment of attention. The same limitation is evident in the developmental literature, which is further hampered by infants' limited behavioral repertoires and the general difficulty in collecting robust and reliable data in the first year of life. The current study engages young infants as they play with age-appropriate toys, capturing visual attention, cardiac measures of arousal, and EEG-based metrics of cognitive processing. The authors find that the temporal relations between measures are different at age 5 months vs. age 10 months. In particular, at 5 months of age, cardiac arousal appears to precede attention, while at 10 months of age attention processes lead to shifts in neural markers of engagement, as captured in theta activity.

      Strengths:

      The study brings to the forefront sophisticated analytical and methodological techniques to bring greater validity to the work typically done in the research lab. By using measures in the moment, they can more closely link biological measures to actual behaviors and cognitive stages. Often, we are forced to capture these measures in separate contexts and then infer in-the-moment relations. The data and techniques provide insights for future research work.

      Weaknesses:

      The sample is relatively modest, although this is somewhat balanced by the sheer number of data points generated by the moment-to-moment analyses. In addition, the study is cross-sectional, so the data cannot capture true change over time. Larger samples, followed over time, will provide a stronger test for the robustness and reliability of the preliminary data noted here. Finally, while the method certainly provides for a more active and interactive infant in testing, we are a few steps removed from the complexity of daily life and social interactions.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #1 (Recommendations For The Authors):

      Here are some specific ways in which clarity can be improved:

      A. Regarding the distinction between constructs, or measures and constructs:

      i. In the results section, I would prefer to mention looking at duration and heart rate as metrics that have been measured, while in the introduction and discussion, a clear 1-to-1 link between construct/cognitive process and behavioural or (neuro)psychophysical measure can be made (e.g., sustained attention is measured via looking durations; autonomic arousal is measured via heart-rate). 

      The way attention and arousal were operationalised are now clarified throughout the text, especially in the results.

      ii. Relatedly, the "attention" variable is not really measuring attention directly. It is rather measuring looking time (proportion of looking time to the toys?), which is the operationalisation, which is hypothesised to be related to attention (the construct/cognitive process). I would make the distinction between the two stronger.

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      B. Each analysis should be set out to address a specific hypothesis. I would rather see hypotheses in the introduction (without direct reference to the details of the models that were used), and how a specific relation between variables should follow from such hypotheses. This would also solve the issue that some analyses did not seem directly necessary to the main goal of the paper. For example:

      i. Are ACF and survival probability analyses aimed at proving different points, or are they different analyses to prove the same point? Consider either making clearer how they differ or moving one to supplementary materials.

      We clarified this in pg. 4 of the revised manuscript.

      ii. The autocorrelation results are not mentioned in the introduction. Are they aiming to show that the variables can be used for cross-correlation? Please clarify their role or remove them.

      We clarified this in pg. 4 of the revised manuscript.

      C. Clarity of cross-correlation figures. To ensure clarity when presenting a cross-correlation plot, it's important to provide information on the lead-lag relationships and which variable is considered X and which is Y. This could be done by labelling the axes more clearly (e.g., the left-hand side of the - axis specifies x leads y, right hand specifies y leads x) or adding a legend (e.g., dashed line indicates x leading y, solid line indicates y leading x). Finally, the limits of the x-axis are consistent across plots, but the limits of the y-axis differ, which makes it harder to visually compare the different plots. More broadly, the plots could have clearer labels, and their resolution could also be improved. 

      This information on what variable precedes/ follows was in the caption of the figures. However, we have edited the figures as per the reviewer’s suggestion and added this information in the figures themselves. We have also uploaded all the figures in higher resolution.

      D. Figure 7 was extremely helpful for understanding the paper, and I would rather have it as Figure 1 in the introduction. 

      We have moved figure 7 to figure 1 as per this request.

      E. Statistics should always be reported, and effects should always be described. For example, results of autocorrelation are not reported, and from the plot, it is also not clear if the effects are significant (the caption states that red dots indicate significance, but there are no red dots. Does this mean there is no autocorrelation?).

      We apologise – this was hard to read in the original. We have clarified that there is no autocorrelation present in Fig 7A and 7D.

      And if so, given that theta is a wave, how is it possible that there is no autocorrelation (connected to point 1)? 

      We thank the reviewer for raising this point. In fact, theta power is looking at oscillatory activity in the EEG within the 3-6Hz window (i.e. 3 to 6 oscillations per second). Whereas we were analysing the autocorrelation in the EEG data by looking at changes in theta power between consecutive 1 second long windows. To say that there is no autocorrelation in the data means that, if there is more 3-6Hz activity within one particular 1-second window, there tends not to be significantly more 3-6Hz activity within the 1-second windows immediately before and after.

      F. Alpha power is introduced later on, and in the discussion, it is mentioned that the effects that were found go against the authors' expectations. However, alpha power and the authors' expectations about it are not mentioned in the introduction. 

      We thank the reviewer for this comment. We have added a paragraph on alpha in the introduction (pg.4).

      Minor points:

      1. At the end of 1st page of introduction, the authors state that: 

      “How children allocate their attention in experimenter-controlled, screen-based lab tasks differs, however, from actual real-world attention in several ways (32-34). For example, the real-world is interactive and manipulable, and so how we interact with the world determines what information we, in turn, receive from it: experiences generate behaviours (35).”

      I think there's more to this though - Lab-based studies can be made interactive too (e.g., Meyer et al., 2023, Stahl & Feigenson, 2015). What remains unexplored is how infants actively and freely initiate and self-structure their attention, rather than how they respond to experimental manipulations.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Stahl, A. E., & Feigenson, L. (2015). Observing the unexpected enhances infants' learning and exploration. Science, 348(6230), 91-94.

      We thank the reviewer for this suggestion and added their point in pg. 4.

      (2) Regarding analysis 4:

      a. In analysis 1 you showed that the duration of attentional episodes changes with age. Is it fair to keep the same start, middle, and termination ranges across age groups? Is 3-4 seconds "middle" for 5-month-olds? 

      We appreciate the comment. There are many ways we could have run these analyses and, in fact, in other papers we have done it differently, for example by splitting each look in 3, irrespective of its duration (Phillips et al., 2023).

      However, one aspect we took into account was the observation that 5-month-old infants exhibited more shorter looks compared to older infants. We recognized that dividing each into 3 parts, regardless of its duration, might have impacted the results. Presumably, the activity during the middle and termination phases of a 1.5-second look differs from that of a look lasting over 7 seconds.

      Two additional factors that provided us with confidence in our approach were: 1) while the definition of "middle" was somewhat arbitrary, it allowed us to maintain consistency in our analyses across different age points. And, 2) we obtained a comparable amount of observations across the two time points (e.g. “middle” at 5 months we had 172 events at 5 months, and 194 events at 10 months).

      b. It is recommended not to interpret lower-level interactions if more complex interactions are not significant. How are the interaction effects in a simpler model in which the 3-way interaction is removed? 

      We appreciate the comment. We tried to follow the same steps as in (Xie et al., 2018). However, we have re-analysed the data removing the 3-way interaction and the significance of the results stayed the same. Please see Author response image 2 below (first: new analyses without the 3-way interactions, second: original analyses that included the 3-way interaction).

      Author response image 2.

      (3) Figure S1: there seems to be an outlier in the bottom-right panel. Do results hold excluding it? 

      We re-run these analyses as per this suggestion and the results stayed the same (refer to SM pg. 2).

      (4) Figure S2 should refer to 10 months instead of 12.

      We thank the reviewer for noticing this typo, we have changed it in the reviewed manuscript (see SM pg. 3). 

      (5) In the 2nd paragraph of the discussion, I found this sentence unclear: "From Analysis 1 we found that infants at both ages showed a preferred modal reorientation rate". 

      We clarified this in the reviewed manuscript in pg10

      (6) Discussion: many (infant) studies have used theta in anticipation of receiving information (Begus et al., 2016) surprising events (Meyer et al., 2023), and especially exploration (Begus et al., 2015). Can you make a broader point on how these findings inform our interpretation of theta in the infant population (go more from description to underlying mechanisms)? 

      We have extended on this point on interpreting frequency bands in pg13 of the reviewed manuscript and thank the reviewer for bringing it up.

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants' preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397-12402.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Begus, K., Southgate, V., & Gliga, T. (2015). Neural mechanisms of infant learning: differences in frontal theta activity during object exploration modulate subsequent object recognition. Biology letters, 11(5), 20150041.

      (7) 2nd page of discussion, last paragraph: "preferred modal reorientation timer" is not a neural/cognitive mechanism, just a resulting behaviour. 

      We agree with this comment and thank the reviewer for bringing it out to our attention. We clarified this in in pg12 and pg13 of the reviewed manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I have a few comments and questions that I think the authors should consider addressing in a revised version. Please see below:

      (1) During preprocessing (steps 5 and 6), it seems like the "noisy channels" were rejected using the pop_rejchan.m function and then interpolated. This procedure is common in infant EEG analysis, but a concern arises: was there no upper limit for channel interpolation? Did the authors still perform bad channel interpolation even when more than 30% or 40% of the channels were identified as "bad" at the beginning with the continuous data? 

      We did state in the original manuscript that “participants with fewer than 30% channels interpolated at 5 months and 25% at 10 months made it to the final step (ICA) and final analyses”. In the revised version we have re-written this section in order to make this more clear (pg. 17).

      (2) I am also perplexed about the sequencing of the ICA pruning step. If the intention of ICA pruning is to eliminate artificial components, would it be more logical to perform this procedure before the conventional artifacts' rejection (i.e., step 7), rather than after? In addition, what was the methodology employed by the authors to identify the artificial ICA components? Was it done through manual visual inspection or utilizing specific toolboxes? 

      We agree that the ICA is often run before, however, the decision to reject continuous data prior to ICA was to remove the very worst sections of data (where almost all channels were affected), which can arise during times when infants fuss or pull the caps. Thus, this step was applied at this point in the pipeline so that these sections of really bad data were not inputted into the ICA. This is fairly widespread practice in cleaning infant data.

      Concerning the reviewer’s second question, of how ICA components were removed – the answer to this is described in considerable detail in the paper that we refer to in that setion of the manuscript. This was done by training a classifier specially designed to clean naturalistic infant EEG data (Haresign et al., 2021) and has since been employed in similar studies (e.g. Georgieva et al., 2020; Phillips et al., 2023).

      (3) Please clarify how the relative power was calculated for the theta (3-6Hz) and alpha (6-9Hz) bands. Were they calculated by dividing the ratio of theta or alpha power to the power between 3 and 9Hz, or the total power between 1 (or 3) and 20 Hz? In other words, what does the term "all frequency bands" refer to in section 4.3.7? 

      We thank the reviewer for this comment, we have now clarified this in pg. 22.

      (4) One of the key discoveries presented in this paper is the observation that attention shifts are accompanied by a subsequent enhancement in theta band power shortly after the shifts occur. Is it possible that this effect or alteration might be linked to infants' saccades, which are used as indicators of attention shifts? Would it be feasible to analyze the disparities in amplitude between the left and right frontal electrodes (e.g., Fp1 and Fp2, which could be viewed as virtual horizontal EOG channels) in relation to theta band power, in order to eliminate the possibility that the augmentation of theta power was attributable to the intensity of the saccades? 

      We appreciate the concern. Average saccade duration in infants is about 40ms (Garbutt et al., 2007). Our finding that the positive cross-correlation between theta and look duration is present not only when we examine zero-lag data but also when we examine how theta forwards-predicts attention 1-2 seconds afterwards seems therefore unlikely to be directly attributable to saccade-related artifact. Concerning the reviewer’s suggestion – this is something that we have tried in the past. Unfortunately, however, our experience is that identifying saccades based on the disparity between Fp1 and Fp2 is much too unreliable to be of any use in analysing data. Even if specially positioned HEOG electrodes are used, we still find the saccade detection to be insufficiently reliable. In ongoing work we are tracking eye movements separately, in order to be able to address this point more satisfactorily.

      (5) The following question is related to my previous comment. Why is the duration of the relationship between theta power and moment-to-moment changes in attention so short? If theta is indeed associated with attention and information processing, shouldn't the relationship between the two variables strengthen as the attention episode progresses? Given that the authors themselves suggest that "One possible interpretation of this is that neural activity associates with the maintenance more than the initiation of attentional behaviors," it raises the question of (is in contradiction to) why the duration of the relationship is not longer but declines drastically (Figure 6). 

      We thank the reviewer for raising this excellent point. Certainly we argue that this, together with the low autocorrelation values for theta documented in Fig 7A and 7D challenge many conventional ways of interpreting theta. We are continuing to investigate this question in ongoing work.

      (6) Have the authors conducted a comparison of alpha relative power and HR deceleration durations between 5 and 10-month-old infants? This analysis could provide insights into whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during free play.

      We thank the reviewer for this suggestion. Indeed, this is an aspect we investigated but ultimately, given that our primary emphasis was on the theta frequency, and considering the length of the manuscript, we decided not to incorporate. However, we attached Author response image 3 below showing there was no significant interaction between HR and alpha band.

      Author response image 3.

      Reviewer #3 (Recommendations For The Authors):

      (1) In reading the manuscript, the language used seems to imply longitudinal data or at the very least the ability to detect change or maturation. Given the cross-sectional nature of the data, the language should be tempered throughout. The data are illustrative but not definitive. 

      We thank the reviewer for this comment. We have now clarified that “Data was analysed in a cross-sectional manner” in pg15.

      (2) The sample size is quite modest, particularly in the specific age groups. This is likely tempered by the sheer number of data points available. This latter argument is implied in the text, but not as explicitly noted. (However, I may have missed this as the text is quite dense). I think more notice is needed on the reliability and stability of the findings given the sample. 

      We have clarified this in pg16.

      (3) On a related note, how was the sample size determined? Was there a power analysis to help guide decision-making for both recruitment and choosing which analyses to proceed with? Again, the analytic approach is quite sophisticated and the questions are of central interest to researchers, but I was left feeling maybe these two aspects of the study were out-sprinting the available data. The general impression is that the sample is small, but it is not until looking at table s7, that it is in full relief. I think this should be more prominent in the main body of the study.

      We have clarified this in pg16.

      (4) The devotes a few sentences to the relation between looking and attention. However, this distinction is central to the design of the study, and any philosophical differences regarding what take-away points can be generated. In my reading, I think this point needs to be more heavily interrogated. 

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      (5) I would temper the real-world attention language. This study is certainly a great step forward, relative to static faces on a computer screen. However, there are still a great number of artificial constraints that have been added. That is not to say that the constraints are bad--they are necessary to carry out the work. However, it should be acknowledged that it constrains the external validity. 

      We have added a paragraph to acknowledged limitations of the setup in pg. 14.

      (6) The kappa on the coding is not strong. The authors chose to proceed nonetheless. Given that, I think more information is needed on how coders were trained, how they were standardized, and what parameters were used to decide they were ready to code independently. Again, with the sample size and the kappa presented, I think more discussion is needed regarding the robustness of the findings. 

      We appreciate the concern. As per our answer to R1, we chose to report the most stringent calculator of inter-rater reliability, but other calculation methods (i.e., percent agreement) return higher scores (see response to R1).

      As per the training, we wrote an extensively detailed coding scheme describing exactly how to code each look that was handed to our coders. Throughout the initial months of training, we meet with the coders on a weekly basis to discuss questions and individual frames that looked ambiguous. After each session, we would revise the coding scheme to incorporate additional details, aiming to make the coding process progressively less subjective. During this period, every coder analysed the same interactions, and inter-rater reliability (IRR) was assessed weekly, comparing their evaluations with mine (Marta). With time, the coders had fewer questions and IRR increased. At that point, we deemed them sufficiently trained, and began assigning them different interactions from each other. Periodically, though, we all assessed the same interaction and meet to review and discuss our coding outputs.

    1. eLife assessment

      This valuable manuscript reveals sex differences in bi-conditioning Pavlovian learning and conditional behavior. Males learn hierarchical context-cue-outcome associations more quickly, but females show more stable and robust task performance. These sex differences are related to cellular activation in the orbitofrontal cortex. Although the evidence for the claims is convincing, the claim of sex differences in context-dependent discrimination behaviour is overstated in places. Nevertheless, the results will be of interest to many behavioural neuroscientists, particularly those who investigate sex-specific behaviours.

    2. Reviewer #1 (Public Review):

      Summary:

      Peterson et al., present a series of experiments in which the Pavlovian performance (i.e. time spent at a food cup/port) of male and female rats is assessed in various tasks in which context/cue/outcome relationships are altered. The authors find no sex differences in context-irrelevant tasks, and no such differences in tasks in which the context signals that different cues will earn different outcomes. They do find sex differences, however, when a single outcome is given and context cues must be used to ascertain which cue will be rewarded with that outcome (Ctx-dep O1 task). Specifically, they find that males acquired the task faster, but that once acquired, performance of the task was more resilient in female rats against exposures to a stressor. Finally, they show that these sex differences are reflected in differential rates of c-fos expression in all three subregions of rat OFC, medial, lateral and ventral, in the sense that it is higher in females than males, and only in the animals subject to the Ctx-dep O1 task in which sex differences were observed.

      Strengths:

      • Well written<br /> • Experiments elegantly designed<br /> • Robust statistics<br /> • Behaviour is the main feature of this manuscript, rather than any flashy techniques or fashionable lab methodologies, and luckily the behaviour is done really well.<br /> • For the most part I think the conclusions were well supported, although I do have some slightly different interpretations to the authors in places.

      Weaknesses:

      The authors have done an excellent job of addressing all previous weaknesses. I have no further comments.

    3. Reviewer #2 (Public Review):

      Summary:

      A bidirectional occasion-setting design is used to examine sex differences in the contextual modulation of reward-related behaviour. It is shown that females are slower to acquire contextual control over cue-evoked reward seeking. However, once established, the contextual control over behaviour was more robust in female rats (i.e., less within-session variability and greater resistance to stress) and this was also associated with increased OFC activation.

      Strengths:

      The authors use sophisticated behavioural paradigms to study the hierarchical contextual modulation of behaviour. The behavioural controls are particularly impressive and do, to some extent, support the specificity of the conclusions. The analyses of the behavioural data are also elegant, thoughtful, and rigorous.

      Comments on revised version:

      In this revised version the authors have addressed the major weaknesses that I identified in my previous review.

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript reports an experiment that compared groups of rats acquisition and performance of a Pavlovian bi-conditional discrimination, in which the presence of one cue, A, signals that the presentation of one CS, X, will be followed by a reinforcer and a second CS, Y, will be nonreinforced. Periods of cue A alternated with periods of cue B, which signaled the opposite relationship, cue X is nonreinforced and cue Y is reinforced. This is a conditional discrimination problem in which the rats learned to approach the food cup in the presence of each CS conditional on the presence of the third background cue. The comparison groups consisted of the same conditional discrimination with the exception that each CS was paired with a different reinforcer. This makes the problem easier to solve as the background is now priming a differential outcome. A third group received simple discrimination training of X reinforced and Y nonreinforced in cues A and B, and the final group were trained with X and Y reinforced on half the trials (no discrimination). The results were clear that the latter two discrimination learning procedures resulted in rapid learning in comparison to the first. Rats required about 3 times as many 4-session blocks to acquire the bi-conditional discrimination than the other two discrimination groups. Within the biconditional discrimination group, female and male rats spent the same amount of time in the food cup during the rewarded CS, but females spent more time in the food cup during CS- than males. The authors interpret this as a deficit in discrimination performance in females on this task and use a measure that exaggerates the difference in CS+ and CS_ responding (a discrimination ratio) to support their point. When tested after acute restraint stress, the male rats spent less time in the food cup during the reinforced CS in comparison to the female rats, but did not lose discrimination performance entirely. The was also some evidence of more fos positive cells in the orbitofrontal cortex in females. Overall, I think the authors were successful in documenting performance on the biconditional discrimination task, showing that it is more difficult to perform than other discriminations is valuable and consistent with the proposal that accurate performance requires encoding of conditional information (which the authors refer to as "context"). There is evidence that female rats spend more time in the food cup during CS-, but this I hesitate to agree that this is an important sex difference. There is no cost to spending more time in the food cup during CS- and they spend much less time there than during CS+. Males and females also did not differ in their CS+ responding, suggesting similar levels of learning, A number of factors could contribute to more food cup time in CS-, such as smaller body size and more locomotor activity. The number of food cup entries during CS+ and CS- was not reported here. Nevertheless, I think the manuscript will make a useful contribution to the field and hopefully lead readers to follow up on these types of tasks. One area for development would be to test the associative properties of the cues controlling the conditional discrimination, can they be shown to have the properties of Pavlovian occasion setting stimuli? Such work would strengthen the justification/rationale for using the term "context" and "occasion setter" to refer to these stimuli in this task in the way the authors do in this paper.

      Strengths:

      Nicely designed and conducted experiment.<br /> Documents performance difference by sex.

      Weaknesses:

      Overstatement of sex differences.<br /> Inconsistent, confusing, and possibly misleading use of terms to describe/imply the underlying processes contributing to performance.