10,000 Matching Annotations
  1. Feb 2025
    1. eLife Assessment

      This study reveals that PRMT1 overexpression drives tumorigenesis of acute megakaryocytic leukemia (AMKL) and that targeting PRMT1 is a viable approach for treating AMKL. While the evidence, based largely on one cell line, is convincing, further validations in additional experiment settings will solidify the conclusion. These findings have important implications for the treatment of AMKL with PRMT1 over expression in the future.

    2. Reviewer #1 (Public review):

      Summary:

      PRMT1 overexpression is linked to poor survival in cancers, including acute megakaryocytic leukemia (AMKL). This manuscript describes the important role of PRMT1 in the metabolic reprograming in AMKL. In a PRMT1-driven AMKL model, only cells with high PRMT1 expression induced leukemia, which was effectively treated with the PRMT1 inhibitor MS023. PRMT1 increased glycolysis, leading to elevated glucose consumption, lactic acid accumulation, and lipid buildup while downregulating CPT1A, a key regulator of fatty acid oxidation. Treatment with 2-deoxy-glucose (2-DG) delayed leukemia progression and induced cell differentiation, while CPT1A overexpression rescued cell proliferation under glucose deprivation. Thus, PRMT1 enhances AMKL cell proliferation by promoting glycolysis and suppressing fatty acid oxidation.

      Strengths:

      This study highlights the clinical relevance of PRMT1 overexpression with AMKL, identifying it as a promising therapeutic target. A key novel finding is the discovery that only AMKL cells with high PRMT1 expression drive leukemogenesis, and this PRMT1-driven leukemia can be effectively treated with the PRMT1 inhibitor MS023. The work provides significant metabolic insights, showing that PRMT1 enhances glycolysis, suppresses fatty acid oxidation, downregulates CPT1A, and promotes lipid accumulation, which collectively drive leukemia cell proliferation. The successful use of the glucose analogue 2-deoxy-glucose (2-DG) to delay AMKL progression and induce cell differentiation underscores the therapeutic potential of targeting PRMT1-related metabolic pathways. Furthermore, the rescue experiment with ectopic Cpt1a expression strengthens the mechanistic link between PRMT1 and metabolic reprogramming. The study employs robust methodologies, including Seahorse analysis, metabolomics, FACS analysis, and in vivo transplantation models, providing comprehensive and well-supported findings. Overall, this work not only deepens our understanding of PRMT1's role in leukemia progression but also opens new avenues for targeting metabolic pathways in cancer therapy.

      Weaknesses:

      This study, while significant, has some limitations.

      (1) The findings rely heavily on a single AMKL cell line, with no validation in patient-derived samples to confirm clinical relevance or even another type of leukemia line. Adding the discussion of PRMT1's role in other leukemia types will increase the impact of this work.

      (2) The observed heterogeneity in Prmt1 expression is noted but not further investigated, leaving gaps in understanding its broader implications.

      (3) Some figures and figure legends didn't include important details or had not matching information. For example,<br /> • Figure 2D, E, F, I (wrong label with D), p-value was not shown. Panel I figure legend is missing.<br /> • Figure 6E, F, p value was not shown.<br /> • Line 272-278, figures should be Figures 7 D-F.

      (4) Some wording is not accurate, such as line 80 "the elevated level of PRMT1 maintains the leukemic stem cells", the study is using the cell line, not leukemia stem cells.

      (5) In the disease model, histopathology of blood, spleen, and BM should be shown.

      (6) Can MS023 treatment reverse the metabolic changes in PRMT1 overexpression AMKL cells?

      (7) It would be helpful if a summary graph is provided at the end of the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript explores the role of PRMT1 in AMKL, highlighting its overexpression as a driver of metabolic reprogramming. PRMT1 overexpression enhances the glycolytic phenotype and extracellular acidification by increasing lactate production in AMKL cells. Treatment with the PRMT1 inhibitor MS023 significantly reduces AMKL cell viability and improves survival in tumor-bearing mice. Intriguingly, PRMT1 overexpression also increases mitochondrial number and mtDNA content. High PRMT1-expressing cells demonstrate the ability to utilize alternative energy sources dependent on mitochondrial energetics, in contrast to parental cells with lower PRMT1 levels.

      Strengths:

      This is a conceptually novel and important finding as PRMT1 has never been shown to enhance glycolysis in AMKL, and provides a novel point of therapeutic intervention for AMKL.

      Weaknesses:

      (1) The manuscript lacks detailed molecular mechanisms underlying PRMT1 overexpression, particularly its role in enhancing survival and metabolic reprogramming via upregulated glycolysis and diminished oxidative phosphorylation (OxPhos). The findings primarily report phenomena without exploring the reasons behind these changes.

      (2) The article shows that PRMT1 overexpression leads to augmented glycolysis and low reliance on the OxPhos. However, the manuscript also shows that PMRT1 overexpression leads to increased mitochondrial number and mitochondrial DNA content and has an elevated NADPH/NAD+ ratio. Further, these overexpressing cells have the ability to better survive on alternative energy sources in the absence of glucose compared to low PMRT1-expressing parental cells. Surprisingly, the seashores assay in PRMT1 overexpressing cells showed no further enhancement in the ECAR after adding mitochondrial decoupler FCCP, indicating the truncated mitochondrial energetics. These results are contradicting and need a more detailed explanation in the discussion.

      (3) How was disease penetrance established following the 6133/PRMT1 transplant before MS023 treatment?

      (4) The 6133/PRMT1 cells show elevated glycolysis compared to parental 6133; why did the author choose the 6133 cells for treatment with the MS023 and ECAR assay (Fig.3 b)? The same is confusing with OCR after inhibitor treatment in 6133 cells; the figure legend and results section description are inconsistent.

      (5) The discussion is too brief and incoherent and does not adequately address key findings. A comprehensive rewrite is necessary to improve coherence and depth.

      (6) The materials and methods section lacks a description of statistical analysis, and significance is not indicated in several figures (e.g., Figures 1C, D, F; Figures 2D, E, F, I). Statistical significance must be consistently indicated. The methods section requires more detailed descriptions to enable replication of the study's findings.

      (7) Figures are hazy and unclear. They should be replaced with high-resolution images, ensuring legible text and data.

      (8) Correct the labeling in Figure 2I by removing the redundant "D."

    1. eLife Assessment

      This useful study reports analyses of Neuropixel recordings in the medial prefrontal cortex and hippocampus of rats in a spatial navigation trial, focusing on classifying prefrontal neurons based on SWR modulation and anatomical location. However, the evidence for claims of a clear link between SWR modulation and neuronal encoding, and the evidence for anatomical organization, is currently incomplete. Further analyses might strengthen the evidence for some conclusions, and some of the strong claims of the paper should likely be moderated.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used high-density probe recordings in the medial prefrontal cortex (PFC) and hippocampus during a rodent spatial memory task to examine functional sub-populations of PFC neurons that are modulated vs. unmodulated by hippocampal sharp-wave ripples (SWRs), an important physiological biomarker that is thought to have a role in mediating information transfer across hippocampal-cortical networks for memory processes. SWRs are associated with the reactivation of representations of previous experiences, and associated reactivation in hippocampal and cortical regions has been proposed to have a role in memory formation, retrieval, planning, and memory-guided behavior. This study focuses on awake SWRs that are prevalent during immobility periods during pauses in behavior. Previous studies have reported strong modulation of a subset of prefrontal neurons during hippocampal SWRs, with some studies reporting prefrontal reactivation during SWRs that have a role in spatial memory processes. The study seeks to extend these findings by examining the activity of SWR-modulated vs. unmodulated neurons across PFC sub-regions, and whether there is a functional distinction between these two kinds of neuronal populations with respect to representing spatial information and supporting memory-guided decision-making.

      Strengths:

      The major strength of the study is the use of Neuropixels 1.0 probes to monitor activity throughout the dorsal-ventral extent of the rodent medial prefrontal cortex, permitting an investigation of functional distinction in neuronal populations across PFC sub-regions. They are able to show that SWR-unmodulated neurons, in addition to having stronger spatial tuning than SWR-modulated neurons as previously reported, also show stronger directional selectivity and theta-cycle skipping properties.

      Weaknesses:

      (1) While the study is able to extend previous findings that SWR-modulated PFC neurons have significantly lower spatial tuning that SWR-unmodulated neurons, the evidence presented does not support the main conclusion of the paper that only the unmodulated neurons are involved in spatial tuning and signaling upcoming choice, implying that SWR-modulated neurons are not involved in predicting upcoming choice, as stated in the abstract. This conclusion makes a categorical distinction between two neuronal populations, that SWR-modulated neurons are involved and SWR-unmodulated are not involved in predicting upcoming choice, which requires evidence that clearly shows this absolute distinction. However, in the analyses showing non-local population decoding in PFC for predicting upcoming choice, the results show that SWR-unmodulated neurons have higher firing rates than SWR-modulated neurons, which is not a categorical distinction. Higher firing rates do not imply that only SWR-unmodulated neurons are contributing to the non-local decoding. They may contribute more than SWR-modulated neurons, but there are no follow-up analyses to assess the contribution of the two sub-populations to non-local decoding.

      (2) Further, the results show that during non-local representations of the hippocampus of the upcoming options, SWR-excited PFC neurons were more active during hippocampal representations of the upcoming choice, and SWR-inhibited PFC neurons were less active during hippocampal representations of the alternative choice. This clearly suggests that SWR-modulated neurons are involved in signaling upcoming choice, at least during hippocampal non-local representations, which contradicts the main conclusion of the paper.

      (3) Similarly, one of the analyses shows that PFC nonlocal representations show no preference for hippocampal SWRs or hippocampal theta phase. However, the examples shown for non-local representations clearly show that these decodes occur prior to the start of the trajectory, or during running on the central zone or start arm. The time period of immobility prior to the start arm running will have a higher prevalence of SWRs and that during running will have a higher prevalence of theta oscillations and theta sequences, so non-local decoded representations have to sub-divided according to these known local-field potential phenomena for this analysis, which is not followed.

      (4) The primary phenomenon that the manuscript relies on is the modulation of PFC neurons by hippocampal SWRs, so it is necessary to perform the PFC population decoding analyses during SWRs (or examine non-local decoding that occurs specifically during SWRs), as reported in previous studies of PFC reactivation during SWRs, to see if there is any distinction between modulated and unmodulated neurons in this reactivation. Even in the case of independent PFC reactivation as reported by one study, this PFC reactivation was still reported to occur during hippocampal SWRs, therefore decoding during SWRs has to be examined. Similarly, the phenomenon of theta cycle skipping is related to theta sequence representations, so decoding during PFC and hippocampal theta sequences has to be examined before coming to any conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This work by den Bakker and Kloosterman contributes to the vast body of research exploring the dynamics governing the communication between the hippocampus (HPC) and the medial prefrontal cortex (mPFC) during spatial learning and navigation. Previous research showed that population activity of mPFC neurons is replayed during HPC sharp-wave ripple events (SWRs), which may therefore correspond to privileged windows for the transfer of learned navigation information from the HPC, where initial learning occurs, to the mPFC, which is thought to store this information long term. Indeed, it was also previously shown that the activity of mPFC neurons contains task-related information that can inform about the location of an animal in a maze, which can predict the animals' navigational choices. Here, the authors aim to show that the mPFC neurons that are modulated by HPC activity (SWRs and theta rhythms) are distinct from those "encoding" spatial information. This result could suggest that the integration of spatial information originating from the HPC within the mPFC may require the cooperation of separate sets of neurons.

      This observation may be useful to further extend our understanding of the dynamics regulating the exchange of information between the HPC and mPFC during learning. However, my understanding is that this finding is mainly based upon a negative result, which cannot be statistically proven by the failure to reject the null hypothesis. Moreover, in my reading, the rest of the paper mainly replicates phenomena that have already been described, with the original reports not correctly cited. My opinion is that the novel elements should be precisely identified and discussed, while the current phrasing in the manuscript, in most cases, leads readers to think that these results are new. Detailed comments are provided below.

      Major concerns:

      (1) The main claim of the manuscript is that the neurons involved in predicting upcoming choices are not the neurons modulated by the HPC. This is based upon the evidence provided in Figure 5, which is a negative result that the authors employ to claim that predictive non-local representations in the mPFC are not linked to hippocampal SWRs and theta phase. However, it is important to remember that in a statistical test, the failure to reject the null hypothesis does not prove that the null hypothesis is true. Since this claim is so central in this work, the authors should use appropriate statistics to demonstrate that the null hypothesis is true. This can be accomplished by showing that there is no effect above some size that is so small that it would make the effect meaningless (see https://doi.org/10.1177/070674370304801108).

      (2) The main claim of the work is also based on Figure 3, where the authors show that SWRs-unmodulated mPFC neurons have higher spatial tuning, and higher directional selectivity scores, and a higher percentage of these neurons show theta skipping. This is used to support the claim that SWRs-unmodulated cells encode spatial information. However, it must be noted that in this kind of task, it is not possible to disentangle space and specific task variables involving separate cognitive processes from processing spatial information such as decision-making, attention, motor control, etc., which always happen at specific locations of the maze. Therefore, the results shown in Figure 3 may relate to other specific processes rather than encoding of space and it cannot be unequivocally claimed that mPFC neurons "encode spatial information". This limitation is presented by Mashoori et al (2018), an article that appears to be a major inspiration for this work. Can the authors provide a control analysis/experiment that supports their claim? Otherwise, this claim should be tempered. Also, the authors say that Jadhav et al. (2016) showed that mPFC neurons unmodulated by SWRs are less tuned to space. How do they reconcile it with their results?

      (3) My reading is that the rest of the paper mainly consists of replications or incremental observations of already known phenomena with some not necessarily surprising new observations:<br /> a) Figure 2 shows that a subset of mPFC neurons is modulated by HPC SWRs and theta (already known), that vmPFC neurons are more strongly modulated by SWRs (not surprising given anatomy), and that theta phase preference is different between vmPFC and dmPFC (not surprising given the fact that theta is a travelling wave).<br /> b) Figure 4 shows that non-local representations in mPFC are predictive of the animal's choice. This is mostly an increment to the work of Mashoori et al (2018). My understanding is that in addition to what had already been shown by Mashoori et al here it is shown how the upcoming choice can be predicted. The author may want to emphasize this novel aspect.<br /> c) Figure 6 shows that prospective activity in the HPC is linked to SWRs and theta oscillations. This has been described in various forms since at least the works of Johnson and Redish in 2007, Pastalkova et al 2008, and Dragoi and Tonegawa (2011 and 2013), as well as in earlier literature on splitter cells. These foundational papers on this topic are not even cited in the current manuscript.<br /> Although some previous work is cited, the current narrative of the results section may lead the reader to think that these results are new, which I think is unfair. Previous evidence of the same phenomena should be cited all along the results and what is new and/or different from previous results should be clearly stated and discussed. Pure replications of previous works may actually just be supplementary figures. It is not fair that the titles of paragraphs and main figures correspond to notions that are well established in the literature (e.g., Figure 2, 2nd paragraph of results, etc.).<br /> d) My opinion is that, overall, the paper gives the impression of being somewhat rushed and lacking attention to detail. Many figure panels are difficult to understand due to incomplete legends and visualizations with tiny, indistinguishable details. Moreover, some previous works are not correctly cited. I tried to make a list of everything I spotted below.

    4. Author response:

      We thank the reviewers for their thoughtful feedback. Below we provide an initial response to the central concerns that they have raised. In general, as part of our revisions, we plan to perform additional analyses to strengthen our conclusions, tone down more speculative interpretations, and clarify the novel contributions of our work. A full, point-by-point reply will follow alongside the revised manuscript.

      Briefly, the reviewers’ central concerns are that some of the conclusions are not sufficiently supported by the experimental evidence, specifically (1) the involvement of sharp-wave ripple (SWR)-unmodulated PFC neurons in signaling upcoming choice and (2) the absence of SWR time-locking of PFC non-local representations. They further suggest that (3) the spatial tuning in the PFC may reflect other cognitive processes rather than encoding spatial information; and (4) the manuscript is ambiguous as to which results are novel or corroborating previous work.

      (1) SWR-unmodulated PFC neurons signaling upcoming choice

      Reviewer 1 suggests that our finding that SWR-modulated neurons relate to hippocampal non-local representations contradicts the manuscript’s main conclusion. However, in our view, there is no contradiction and the finding highlights the distinction between the two sub-populations, namely the SWR-modulated neurons linked to hippocampal non-local representations, and the SWR-unmodulated neurons that are more active during prefrontal non-local representations.

      We do agree with the reviewer that the observation of higher firing rates of SWR-unmodulated neurons in the expression of non-local representations does not mean that these neurons are the sole or even main contributors to the non-local decoding. To address both comments, we will perform additional analyses to further disentangle the contributions of SWR-modulated and SWR-unmodulated PFC neurons to the non-local representations of upcoming choice.

      (2) Time-locking of PFC non-local representations to hippocampal SWRs

      Reviewer 1 comments that in the analysis of time-locking to hippocampal SWRs and theta phase, the behavior of the animals needs to be taken into account (i.e., immobility or running). We confirm that this was indeed done in our analysis and we will clarify this point in the revised manuscript.

      The reviewer further requested that PFC decoding during SWRs be performed at shorter timescales as in previous studies. We like to point out that (1) we found no increase in non-local decoding in the PFC around SWR onset (see Fig 5a), and (2) most of the non-local representations in the PFC occurred during the expression of local representations in the hippocampus (see Fig 4d). These data suggest that the non-local representations in both brain regions are expressed independently. To further strengthen this idea, we plan to (1) include the result of decoding PFC activity during SWRs at fine timescales as the reviewer suggested, and (2) look at the firing rates of PFC neurons during non-local representations exclusively when the hippocampus is encoding the actual (local) position.

      Following a suggestion by reviewer 2, we will also add a statistical assessment of how strongly the data supports the absence of time-locking.

      (3) Spatial tuning in the mPFC

      Reviewer 2 points out that the spatial tuning in the prefrontal cortex may be related to cognitive processes (e.g., attention or decision-making) rather than spatial encoding. However, our results show that decoded mPFC activity reliably differentiates between the two start and goal arms (Fig 4a), rate maps show little evidence of mirroring (Fig 3a), and the activity predicts turns in the cue-based task during which goal arms switch pseudo-randomly (meaning that the non-local representations encode the North and South arm alternatingly and correctly, rather than encoding a general rewarded goal arm; Fig. 4b). While it is likely that mPFC encodes several task-related variables, our data suggest that it also encodes distinct locations.

      The reviewer further claims that the results of Jadhav et al. (2016) contradict our findings because they supposedly showed that mPFC neurons unmodulated by SWRs are less tuned to space. However, this is incorrect, as Jadhav et al. (2016) showed that SWR-unmodulated PFC neurons have lower spatial coverage and consequentially are more spatially selective, which is consistent with our observations. We will rephrase this in the text to improve clarity.

      (4) Novelty

      We thank reviewer 2 for pointing out the significance of several novel findings in our work that deserve to be highlighted. This includes the dorsal-ventral profile of SWR-modulation and theta phase locking in the PFC and our observation that the neural representations in the PFC precede the behavioral switch in reversal learning. In our revised manuscript, we will rewrite the text to better emphasize our novel contributions, clearly distinguish new findings from confirmatory observations, and add missing citations where appropriate.

    1. eLife Assessment

      Using electrophysiological recordings in freely moving rats, this valuable study investigates the role of different gamma frequency bands in the development of spatial representations in the hippocampus. Solid evidence supports the idea that gamma-modulated neurons are crucial for generating specific neuronal sequences. These findings will be of interest to neuroscientists studying spatial navigation and neuronal dynamics.

    2. Reviewer #1 (Public review):

      This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, play a key role in theta sequence development. How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The revised version of this paper has been significantly improved. Major concerns in the previous round of review on technical and conceptual aspects of the relationship between gamma oscillations and theta sequences are addressed. The main conclusion is supported by the data presented.

    3. Reviewer #2 (Public review):

      This manuscript addresses an important question which has not yet been solved in the field, what is the contribution of different gamma oscillatory inputs to the development of "theta sequences" in the hippocampal CA1 region. Theta sequences have received much attention due to their proposed roles in encoding short-term behavioral predictions, mediating synaptic plasticity, and guiding flexible decision making. Gamma oscillations in CA1 offer a readout of different inputs to this region and have been proposed to synchronize neuronal assemblies and modulate spike timing and temporal coding. However, the interactions between these two important phenomena have not been sufficiently investigated. The authors conducted place cell and local field potential (LFP) recordings in the CA1 region of rats running on a circular track. They then analyzed the phase locking of place cell spikes to slow and fast gamma rhythms, the evolution of theta sequences during behavior and the interaction between these two phenomena. They found that place cell with the strongest modulation by fast gamma oscillations were the most important contributors to the early development of theta sequences and that they also displayed a faster form of phase precession within slow gamma cycles nested with theta. The results reported are interesting and support the main conclusions of the authors. However, the manuscript needs significant improvement in several aspects regarding data analysis, description of both experimental and analytical methods and alternative interpretations, as I detail below.

      • The experimental paradigm and recordings should be explained at the beginning of the Results section. Right now, there is no description whatsoever which makes it harder to understand the design of the study.<br /> • An important issue that needs to be addressed is the very small fraction of CA1 cells phased-locked to slow gamma rhythms (3.7%). This fraction is much lower than in many previous studies, that typically report it in the range of 20-50 %. However, this discrepancy is not discussed by the authors. This needs to be explained and additional analysis considered. One analysis that I would suggest, although there are also other valid approaches, is to, instead of just analyze the phase locking in two discrete frequency bands, to compute the phase locking will all LFP frequencies from 25-100 Hz. This will offer a more comprehensive and unbiased view of the gamma modulation of place cell firing. Alternative metrics to mean vector length that are less sensitive to firing rates, such as pairwise phase consistency index (Vinck et a., Neuroimage, 2010), could be implemented. This may reveal whether the low fraction of phase locked cells could be due to a low number of spikes entering the analysis.<br /> • From the methods, it is not clear to me whether the reference LFP channel was consistently selected to be a different one that where the spikes analyzed were taken. This is the better practice to reduce the contribution of spike leakage that could substantially inflate the coupling with faster gamma frequencies. These analyses need to be described in more detail.<br /> • The initial framework of the authors of classifying cells into fast gamma and not fast gamma modulated implies a bimodality that may be artificial. The authors should discuss the nuances and limitations of this framework. For example, several previous work has shown that the same place cell can couple to different gamma oscillations (e.g., Lastoczni et al., Neuron, 2016; Fernandez-Ruiz et al., Neuron, 2017; Sharif et al., Neuron,2021).<br /> • It would be useful to provide a more through characterization of the physiological properties of FG and NFG cells, as this distinction is the basis of the paper. Only very little characterization of some place cell properties is provided in Figure 5. Important characteristics that should be very feasible to compare include average firing rate, burstiness, estimated location within the layer (i.e., deep vs superficial sublayers) and along the transverse axis (i.e., proximal vs distal), theta oscillation frequency, phase precession metrics (given their fundamental relationship with theta sequences), etc.<br /> • It is not clear to me how the analysis in Figure 6 was performed. In Fig. 6B I would think that the grey line should connect with the bottom white dot in the third panel, which would the interpretation of the results.

      Comments on revisions:

      The authors have conducted new analysis to address the issues I and the other reviewers raised in our original revision. As a result, the revised manuscript has been substantially improved.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hippocampal place cells display a sequence of firing activities when the animal travels through a spatial trajectory at a behavioral time scale of seconds to tens of seconds. Interestingly, parts of the firing sequence also occur at a much shorter time scale: ~120 ms within individual cycles of theta oscillation. These so-called theta sequences are originally thought to naturally result from the phenomenon of theta phase precession. However, there is evidence that theta sequences do not always occur even when theta phase precession is present, for example, during the early experience of a novel maze. The question is then how they emerge with experience (theta sequence development). This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, may play a key role in theta sequence development.

      The authors analyzed place cells, LFPs, and theta sequences as rats traveled a circular maze in repeated laps. They found that a group of place cells were significantly tuned to a particular phase of fast-gamma (FG-cells), in contrast to others that did not show such tunning (NFG-cells). The authors then omitted FG-cells or the same number of NFG-cells, in their algorithm of theta sequence detection and found that the quality of theta sequences, quantified by a weighted correlation, was worse with the FG-cell omission, compared to that with the NFG-cell omission, during later laps, but not during early laps. What made the FG-cells special for theta sequences? The authors found that FG-cells, but not NFG-cells, displayed phase recession to slow-gamma (25 - 45 Hz) oscillations (within theta cycles) during early laps (both FG- and NFG-cells showed slow-gamma phase precession during later laps). Overall, the authors conclude that FG-cells contribute to theta sequence development through slow-gamma phase precession during early laps.

      How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The identification of FG-cells in this study is straightforward. Evidence is also presented for the role of these cells in theta sequence development. However, given several concerns elaborated below, whether the evidence is sufficiently strong for the conclusion needs further clarification, perhaps, in future studies.

      We thank the reviewer for these positive comments.

      (1) The results in Figure 3 and Figure 8 seems contradictory. In Figure 8, all theta sequences displayed a seemingly significant weighted correlation (above 0) even in early laps, which was mostly due to FG-cell sequences but not NFG-cell sequences (correlation for NFG-sequences appeared below 0). However, in Figure 3H, omitting FG-cells and omitting NFG-cells did not produce significant differences in the correlation. Conversely, FG-cell and NFG-cell sequences were similar in later laps in Figure 8 (NFG-cell sequences appeared even better than FG-cell sequences), yet omitting NFG-cells produced a better correlation than omitting FG-cells. This confusion may be related to how "FG-cell-dominant sequences" were defined, which is unclear in the manuscript. Nevertheless, the different results are not easy to understand.

      We thank the reviewer for pointing out this important problem.  The potential contradictory can be interpreted by different sequence dataset included in Fig3 and Fig8, described as follows.

      (1) In Fig 3, all sequences decoded without either FG or NFG cells were included, defined as exFG-sequences and exNFG sequences, so that we couldn’t observe sequence development at early phase and thus the weighted correlation was low.  (2) In Fig8, however, the sequences with either FG or NFG cells firing across at least 3 slow gamma cycles were included, defined as FG-cell sequences and NFG-cell sequences.  This criterion ensures to investigate the relationship between sequence development and slow gamma phase precession, so that these sequences were contributed by cells likely to show slow gamma phase precession.  These definitions have been updated to the “Theta sequences detection” section of the Methods (Line 606-619).

      At early phase, there’s still no difference of weighted correlation between FG-cell sequences and NFG-cell sequences (Author response image 1A, Student’s t test, t(65)=0.2, p=0.8, Cohen's D=0.1), but the FG-cell sequences contained high proportion of slow gamma phase precession (Fig8F).  At late phase, both FG-cell sequences and NFG-cell sequences exhibited slow gamma phase precession, so that their weighted correlation were high with no difference (Author response image 1B, Student’s t test, t(62)=-1.1, p=0.3, Cohen's D=0.3).  This result further indicates that the theta sequence development requires slow gamma phase precession, especially for FG cells during early phase.

      Author response image 1.

      (2) The different contributions between FG-cells and NFG-cells to theta sequences are supposed not to be caused by their different firing properties (Figure 5). However, Figure 5D and E showed a large effect size (Cohen's D = 07, 0.8), although not significant (P = 0.09, 0.06). But the seemingly non-significant P values could be simply due to smaller N's (~20). In other parts of the manuscript, the effect sizes were comparable or even smaller (e.g. D = 0.5 in Figure 7B), but interpreted as positive results: P values were significant with large N's (~480 in Fig. 7B). Drawing a conclusion purely based on a P value while N is large often renders the conclusion only statistical, with unclear physical meaning. Although this is common in neuroscience publications, it makes more sense to at least make multiple inferences using similar sample sizes in the same study.

      We thank the reviewer for this kind suggestion.  We made multiple inferences using similar sample sizes as much as possible.  In Fig7B, we did the statistical analysis with sessions as samples, and we found the significant conclusion was maintained.  These results have been updated to the revised manuscript (Lines 269-270).and the Fig7B has been replaced correspondingly.

      (3) In supplementary Figure 2 - S2, FG-cells displayed stronger theta phase precession than NFG-cells, which could be a major reason why FG-cells impacted theta sequences more than NFG cells. Although factors other than theta phase precession may contribute to or interfere with theta sequences, stronger theta phase precession itself (without the interference of other factors), by definition, can lead to stronger theta sequences.

      This is a very good point.  The finding that FG-cells displayed stronger theta phase precession than NFG-cells was consistent with the finding of Guardamagna et al., 2023 Cell Rep, that the theta phase precession pattern emerged with strong fast gamma.  Since slow gamma phase precession occurred within theta cycles, it is hard to consider the contribution of these factors to theta sequences development, without taking theta phase precession into account.  But one should be noted that the theta sequences could not be developed even if theta phase precession existed from the very beginning of the exploration (Feng et al., 2025 J Neurosci).  These findings suggest that theta phase precession, together with other factors, impact theta sequence development.  However, the weight of each factor and their interaction still need to be further investigated.  We have discussed this possibility in the Discussion section (Lines 361- 373).

      (4) The slow-gamma phase precession of FG-cells during early laps is supposed to mediate or contribute to the emergence of theta sequences during late laps (Figure 1). The logic of this model is unclear. The slow-gamma phase precession was present in both early and late laps for FG-cells, but only present in late laps for NFG-cells. It seems more straightforward to hypothesize that the difference in theta sequences between early and later laps is due to the difference in slow-gamma phase precession of NFG cells between early and late laps. Although this is not necessarily the case, the argument presented in the manuscript is not easy to follow.

      We thank the reviewer for pointing this out.  The slow gamma phase precession was first found in my previous publication (Zheng et al., 2016 Neuron), which indicates a temporally compressed manner for coding spatial information related to memory retrieval.  In this case, we would expect that slow gamma phase precession occurred in all cells during late laps, because spatial information was retrieved when rats have been familiar with the environment.  However, during early laps when novel information was just encoded, there would be balance between fast gamma and slow gamma modulation of cells for upcoming encoding-retrieval transition.  A possibility is that FG-cells support this balance by receiving modulation of both fast gamma and slow gamma, but with distinct phase-coding modes (fast gamma phase locking and slow gamma phase precession) in a temporally coordinated manner.  We have discussed this possibility in the Discussion section (Lines 415- 428).

      (5) There are several questions on the description of methods, which could be addressed to clarify or strengthen the conclusions.

      (i) Were the identified fast- and slow-gamma episodes mutually exclusive?

      Yes, the fast- and slow-gamma episodes are mutually exclusive. We have added descriptions in the “Detection of gamma episodes” section in the Methods part (Lines 538-550).

      (ii) Was the task novel when the data were acquired? How many days (from the 1st day of the task) were included in the analysis? When the development of the theta sequence was mentioned, did it mean the development in a novel environment, in a novel task, or purely in a sense of early laps (Lap 1, 2) on each day?

      We thank the reviewer for pointing this out.  The task was not novel to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, when the development of the theta sequence was mentioned, it meant a sense of early laps on each day.

      (iii) How were the animals' behavioral parameters equalized between early and later laps? For example, speed or head direction could potentially produce the differences in theta sequences.

      This is a very good point.  In terms of the effect of running speed on theta sequences, we quantified the running speeds during theta sequences across trials 1-5.  We found that the rats were running at stable running speed, which has been reported in Fig.3F.  In terms of the effect of head direction on theta sequences, we measured the angle difference between head direction and running direction.  We found that the angle difference for each lap was distributed around 0, with no significant difference across laps (Fig.S3, Watson-Williams multi-sample test, F(4,55)=0.2, p=0.9, partial η<sup>2</sup>= 0.01).  These results indicate that the differences in theta sequences across trials cannot be interpreted by the variability of behavioral parameters.  We have updated these results and corresponding methods in the revised manuscript (Lines 172-175, Lines 507-511, with a new Fig.S3).

      Reviewer #2 (Public Review):

      This manuscript addresses an important question that has not yet been solved in the field, what is the contribution of different gamma oscillatory inputs to the development of "theta sequences" in the hippocampal CA1 region? Theta sequences have received much attention due to their proposed roles in encoding short-term behavioral predictions, mediating synaptic plasticity, and guiding flexible decision-making. Gamma oscillations in CA1 offer a readout of different inputs to this region and have been proposed to synchronize neuronal assemblies and modulate spike timing and temporal coding. However, the interactions between these two important phenomena have not been sufficiently investigated. The authors conducted place cell and local field potential (LFP) recordings in the CA1 region of rats running on a circular track. They then analyzed the phase locking of place cell spikes to slow and fast gamma rhythms, the evolution of theta sequences during behavior, and the interaction between these two phenomena. They found that place cells with the strongest modulation by fast gamma oscillations were the most important contributors to the early development of theta sequences and that they also displayed a faster form of phase precession within slow gamma cycles nested with theta. The results reported are interesting and support the main conclusions of the authors. However, the manuscript needs significant improvement in several aspects regarding data analysis, description of both experimental and analytical methods, and alternative interpretations, as I detail below.

      • The experimental paradigm and recordings should be explained at the beginning of the Results section. Right now, there is no description whatsoever which makes it harder to understand the design of the study.

      We thank the reviewer for this kind suggestion.  The description of experimental paradigm and recordings has been added to the beginning of the results section (Lines 114-119).

      • An important issue that needs to be addressed is the very small fraction of CA1 cells phased-locked to slow gamma rhythms (3.7%). This fraction is much lower than in many previous studies, that typically report it in the range of 20-50%. However, this discrepancy is not discussed by the authors. This needs to be explained and additional analysis considered. One analysis that I would suggest, although there are also other valid approaches, is to, instead of just analyzing the phase locking in two discrete frequency bands, compute the phase locking will all LFP frequencies from 25-100 Hz. This will offer a more comprehensive and unbiased view of the gamma modulation of place cell firing. Alternative metrics to mean vector length that is less sensitive to firing rates, such as pairwise phase consistency index (Vinck et a., Neuroimage, 2010), could be implemented. This may reveal whether the low fraction of phase-locked cells could be due to a low number of spikes entering the analysis.

      We thank the reviewer for this constructive suggestion.  A previous work also on Long-Evans rats showed that the proportion of slow gamma phase-locked cells during novelty exploration was ~20%, however it dropped to ~10% during familiar exploration (Fig.4E, Kitanishi et al., 2015 Neuron).  This suggests that the proportion of slow gamma phase-locked cells may decreased with familiarity of the environment, which supports our data.  In addition, we also calculated the pairwise phase consistency index in terms of the effect of spike counts on MVL.  We could observe that the tendency of PPC (Author response image 2A) and MVL (Author response image 2B) along frequency bands were consistent across different subsets of cells, suggesting that the determination of cell subsets by MVL metric was not biased by the low number of spikes.  These results further shed light to the contribution of slow gamma phase precession of place cells to theta sequence development.

      Author response image 2.

      • From the methods, it is not clear to me whether the reference LFP channel was consistently selected to be a different one that where the spikes analyzed were taken. This is the better practice to reduce the contribution of spike leakage that could substantially inflate the coupling with faster gamma frequencies. These analyses need to be described in more detail.

      We thank the reviewer for pointing this out.  In the main manuscript, we used local LFPs as the cells were recorded from the same tetrode.  In addition, we selected an individual tetrode which located at stratum pyramidale and at the center of the drive bundle for each rat.  We detected a similar proportion of FG-cells by using LFPs on this tetrode, compared with that using local LFPs (Author response image 3A-B, Chi-squared test, χ<sup>2</sup>= 0.9, p=0.4, Cramer V=0.03).  We further found that the PPC measurement of FG- and NFG-cells were different at fast gamma band by using central LFPs (Author response image 3D), consistent with that by using local LFPs (Author response image 3C).  Therefore, these results suggest that the findings related to fast gamma was not due to the contribution of spike leakage in the local LFPs.  We have updated the description in the manuscript (Lines 553-557, 566-568).

      Author response image 3.

      • The initial framework of the authors of classifying cells into fast gamma and not fast gamma modulated implies a bimodality that may be artificial. The authors should discuss the nuances and limitations of this framework. For example, several previous work has shown that the same place cell can couple to different gamma oscillations (e.g., Lastoczni et al., Neuron, 2016; Fernandez-Ruiz et al., Neuron, 2017; Sharif et al., Neuron,2021).

      We thank the reviewer for this kind suggestion.  We have cited these references and discussed the possibility of bimodal phase-locking in the manuscript (Lines 430-433).

      • It would be useful to provide a more thorough characterization of the physiological properties of FG and NFG cells, as this distinction is the basis of the paper. Only very little characterization of some place cell properties is provided in Figure 5. Important characteristics that should be very feasible to compare include average firing rate, burstiness, estimated location within the layer (i.e., deep vs superficial sublayers) and along the transverse axis (i.e., proximal vs distal), theta oscillation frequency, phase precession metrics (given their fundamental relationship with theta sequences), etc.

      We thank the reviewer for this constructive suggestion.  In addition to the characterizations shown in Fig5, we also analyzed firing rate, anatomical location and theta modulation to compare the physiological properties of FG- and NFG-cells.

      In terms of the firing properties of both types of cells, we found that the mean firing rate of FG-cell was higher than NFG-cell (Fig. 5A, Student's t-test, t(22) = 2.1, p = 0.04, Cohen's D = 0.9), which was consistent with the previous study that the firing rate was higher during fast gamma than during slow gamma (Zheng et al., 2015 Hippocampus).  However, the spike counts of excluded FG- and NFG-cells for decoding were similar (Fig. 5B, Student's t-test, t(22) = 1.2, p = 0.3, Cohen's D = 0.5), suggesting that the differences found in theta sequences cannot be accounted for by different decoding quality related to spike counts.  In addition, we measured the burstiness based on the distribution of inter-spike-intervals, and we found that the bursting probability of spikes was not significantly different between FG and NFG cells (Author response image 4A, Student's t-test, t(22) = 0.6, p=0.5, Cohen's d=0.3).

      In terms of theta modulation of cells, we first compared the theta frequency related to the firing of FG and NFG cells.  We detected the instantaneous theta frequency at each spike timing of FG and NFG cells, and found that it was not significantly different between cell types (Author response image 4B, Student's t-test, t(22) = -0.5, p=0.6, Cohen's d=0.2).  In addition, we found the proportion of cells with significant theta phase precession was greater in FG-cells than in NFG-cells (Fig. S2E).  However, the slope and starting phase of theta phase precession was not significantly different between FG and NFG cells (Author response image 4C, Student's t-test, t(21) = 0.3, p=0.8, Cohen's d=0.1; Author response image 4D, Watson-Williams test, F(1,21)=0.5, p=0.5, partial η<sup>2</sup>=0.02).

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      Author response image 4.

      • It is not clear to me how the analysis in Figure 6 was performed. In Figure 6B I would think that the grey line should connect with the bottom white dot in the third panel, which would be the interpretation of the results.

      We thank the reviewer for raising this good point.  The grey line was just for intuitional observation, not a quantitative analysis.  We have removed the grey lines from all heat maps in Fig.6.

      Reviewer #3 (Public Review):

      [Editors' note: This review contains many criticisms that apply to the whole sub-field of slow/fast gamma oscillations in the hippocampus, as opposed to this particular paper. In the editors' view, these comments are beyond the scope of any single paper. However, they represent a view that, if true, should contextualise the interpretation of this paper and all papers in the sub-field. In doing so, they highlight an ongoing debate within the broader field.]

      Summary:

      The authors aimed to elucidate the role of dynamic gamma modulation in the development of hippocampal theta sequences, utilizing the traditional framework of "two gammas," a slow and a fast rhythm. This framework is currently being challenged, necessitating further analyses to establish and secure the assumed premises before substantiating the claims made in the present article.

      The results are too preliminary and need to integrate contemporary literature. New analyses are required to address these concerns. However, by addressing these issues, it may be possible to produce an impactful manuscript.

      We thank the reviewer for raising these important questions in the hippocampal gamma field.  We have done a lot of new analyses according to the comments to strengthen our manuscript.

      I. Introduction

      Within the introduction, multiple broad assertions are conveyed that serve as the premise for the research. However, equally important citations that are not mentioned potentially contradict the ideas that serve as the foundation. Instances of these are described below:

      (1) Are there multiple gammas? The authors launched the study on the premise that two different gamma bands are communicated from CA3 and the entorhinal cortex. However, recent literature suggests otherwise, offering that the slow gamma component may be related to theta harmonics:

      From a review by Etter, Carmichael and Williams (2023)

      "Gamma-based coherence has been a prominent model for communication across the hippocampal-entorhinal circuit and has classically focused on slow and fast gamma oscillations originating in CA3 and medial entorhinal cortex, respectively. These two distinct gammas are then hypothesized to be integrated into hippocampal CA1 with theta oscillations on a cycle-to-cycle basis (Colgin et al., 2009; Schomburg et al., 2014). This would suggest that theta oscillations in CA1 could serve to partition temporal windows that enable the integration of inputs from these upstream regions using alternating gamma waves (Vinck et al., 2023). However, these models have largely been based on correlations between shifting CA3 and medial entorhinal cortex to CA1 coherence in theta and gamma bands. In vivo, excitatory inputs from the entorhinal cortex to the dentate gyrus are most coherent in the theta band, while gamma oscillations would be generated locally from presumed local inhibitory inputs (Pernía-Andrade and Jonas, 2014). This predominance of theta over gamma coherence has also been reported between hippocampal CA1 and the medial entorhinal cortex (Zhou et al., 2022). Another potential pitfall in the communication-through-coherence hypothesis is that theta oscillations harmonics could overlap with higher frequency bands (Czurkó et al., 1999; Terrazas et al., 2005), including slow gamma (Petersen and Buzsáki, 2020). The asymmetry of theta oscillations (Belluscio et al., 2012) can lead to harmonics that extend into the slow gamma range (Scheffer-Teixeira and Tort, 2016), which may lead to a misattribution as to the origin of slow-gamma coherence and the degree of spike modulation in the gamma range during movement (Zhou et al., 2019)."

      And from Benjamin Griffiths and Ole Jensen (2023)

      "That said, in both rodent and human studies, measurements of 'slow' gamma oscillations may be susceptible to distortion by theta harmonics [53], meaning open questions remain about what can be attributed to 'slow' gamma oscillations and what is attributable to theta."

      This second statement should be heavily considered as it is from one of the original authors who reported the existence of slow gamma.

      Yet another instance from Schomburg, Fernández-Ruiz, Mizuseki, Berényi, Anastassiou, Christof Koch, and Buzsáki (2014):

      "Note that modulation from 20-30 Hz may not be related to gamma activity but, instead, reflect timing relationships with non-sinusoidal features of theta waves (Belluscio et al., 2012) and/or the 3rd theta harmonic."

      One of this manuscript's authors is Fernández-Ruiz, a contemporary proponent of the multiple gamma theory. Thus, the modulation to slow gamma offered in the present manuscript may actually be related to theta harmonics.

      With the above emphasis from proponents of the slow/fast gamma theory on disambiguating harmonics from slow gamma, our first suggestion to the authors is that they A) address these statements (citing the work of these authors in their manuscript) and B) demonstrably quantify theta harmonics in relation to slow gamma prior to making assertions of phase relationships (methodological suggestions below). As the frequency of theta harmonics can extend as high as 56 Hz (PMID: 32297752), overlapping with the slow gamma range defined here (25-45 Hz), it will be important to establish an approach that decouples the two phenomena using an approach other than an arbitrary frequency boundary.

      We agree with the reviewer that the theta oscillations harmonics could overlap with higher frequency bands including slow gamma, as the above reviews discussed.  In order to rule out the possibility of theta harmonics effects in this study, we added new analyses in this letter (see below).

      (2) Can gammas be segregated into different lamina of the hippocampus? This idea appears to be foundational in the premise of the research but is also undergoing revision.

      As discussed by Etter et al. above, the initial theory of gamma routing was launched on coherence values. However, the values reported by Colgin et al. (2009) lean more towards incoherence (a value of 0) rather than coherence (1), suggesting a weak to negligible interaction. Nevertheless, this theory is coupled with the idea that the different gamma frequencies are exclusive to the specific lamina of the hippocampus.

      Recently, Deschamps et al. (2024) suggested a broader, more nuanced understanding of gamma oscillations than previously thought, emphasizing their wide range and variability across hippocampal layers. This perspective challenges the traditional dichotomy of gamma sub-bands (e.g., slow vs. medium gamma) and their associated cognitive functions based on a more rigid classification according to frequency and phase relative to the theta rhythm. Moreover, they observed all frequencies across all layers.

      Similarly, the current source density plots from Belluscio et al. (2012) suggest that SG and FG can be observed in both the radiatum and lacunosum-moleculare.

      Therefore, if the initial coherence values are weak to negligible and both slow and fast gamma are observed in all layers of the hippocampus, can the different gammas be exclusively related to either anatomical inputs or psychological functions (as done in the present manuscript)? Do these observations challenge the authors' premise of their research? At the least, please discuss.

      We thank the reviewer for raising this point, which I believe still remains controversial in this field.  We also thank the reviewer for providing detailed proofs of existence forms of gamma rhythms.  The reviewer was considering 2 aspects of gamma: 1) the reasonability of dividing slow and fast gamma by specific frequency bands; 2) the existence of gamma across all hippocampal layers, which challenged the functional significance of different types of gamma rhythms.  Although the results in Douchamps et al., 2024 challenged the idea of rigid gamma sub-bands, we still could see separate slow and fast gamma components exclusively occurred along time course, with central frequency of slow gamma lower than ~60Hz and central frequency of fast gamma higher than ~60Hz (Fig.1b of Douchamps et al., 2024).  This was also seen in the rat dataset of this reference (Fig. S3).  Since their behavioral test required both memory encoding and retrieval processes, it was hard to distinguish the role of different gamma components as they may dynamically coordinate during complex memory process.  Thus, although the behavioral performance can be decoded from broad range of gamma, we still cannot deny the existence of difference gamma rhythms and their functional significance during difference memory phases.

      (3) Do place cells, phase precession, and theta sequences require input from afferent regions? It is offered in the introduction that "Fast gamma (~65-100Hz), associated with the input from the medial entorhinal cortex, is thought to rapidly encode ongoing novel information in the context (Fernandez-Ruiz et al., 2021; Kemere, Carr, Karlsson, & Frank, 2013; Zheng et al., 2016)".

      CA1 place fields remain fairly intact following MEC inactivation include Ipshita Zutshi, Manuel Valero, Antonio Fernández-Ruiz , and György Buzsáki (2022)- "CA1 place cells and assemblies persist despite combined mEC and CA3 silencing" and from Hadas E Sloin, Lidor Spivak, Amir Levi, Roni Gattegno, Shirly Someck, Eran Stark (2024) - "These findings are incompatible with precession models based on inheritance, dual-input, spreading activation, inhibition-excitation summation, or somato-dendritic competition. Thus, a precession generator resides locally within CA1."

      These publications, at the least, challenge the inheritance model by which the afferent input controls CA1 place field spike timing. The research premise offered by the authors is couched in the logic of inheritance, when the effect that the authors are observing could be governed by local intrinsic activity (e.g., phase precession and gamma are locally generated, and the attribution to routed input is perhaps erroneous). Certainly, it is worth discussing these manuscripts in the context of the present manuscript.

      We thank the review for this discussion.  The main purpose of our current study is to investigate the mechanism of theta sequence development along with learning, which may or may not dependent on theta phase precession of single place cells as it remains controversial in this field.  Also, there is a limitation in this study that all gamma components were recorded from stratum pyramidale, thus we cannot make any conclusion on the originate of gamma in modulating sequence development.

      II. Results

      (1) Figure 2-

      a. There is a bit of a puzzle here that should be discussed. If slow and fast frequencies modulate 25% of neurons, how can these rhythms serve as mechanisms of communication/support psychological functions? For instance, if fast gamma is engaged in rapid encoding (line 72) and slow gamma is related to the integration processing of learned information (line 84), and these are functions of the hippocampus, then why do these rhythms modulate so few cells? Is this to say 75% of CA1 neurons do not listen to CA3 or MEC input?

      The proportion ~25% was the part of place cells phase-locked to either slow or fast gamma.  However, one of the main findings in this study was that most cells were modulated by slow gamma as they fired at precessed slow gamma phase within a theta cycle (Figs 6-8), which would promote information compression for theta sequence development.  Therefore, we didn’t mean that only a small proportion of cells were modulated by gamma rhythms and contributed to this process.

      b. Figure 2. It is hard to know if the mean vector lengths presented are large or small. Moreover, one can expect to find significance due to chance. For instance, it is challenging to find a frequency in which modulation strength is zero (please see Figure 4 of PMID: 30428340 or Figure 7 of PMID: 31324673).

      i. Please construct the histograms of Mean Vector Length as in the above papers, using 1 Hz filter steps from 1-120Hz and include it as part of Figure 2 (i.e., calculate the mean vector length for the filtered LFP in steps of 1-2 Hz, 2-3 Hz, 3-4 Hz,... etc). This should help the authors portray the amount of modulation these neurons have relative to the theta rhythm and other frequencies. If the theta mean vector length is higher, should it be considered the primary modulatory influence of these neurons (with slow and fast gammas as a minor influence)?

      We thank the review for this suggestion.  We measured the mean vector length at 5Hz step (equivalent to 1Hz step), and we found that the FG-cells were phase-locked to fast gamma rhythms even stronger than that to theta (Author response image 2B, mean MVL of theta=0.126±0.007, mean MVL of theta=0.175±0.006, paired t-test, t(112)=-5.9, p=0.01, Cohen's d=0.7).  In addition, in some previous studies with significant fast gamma phase locking, the MVL values were around 0.15 by using broad gamma band (Kitanishi et al., 2015 Neuron, Lasztóczi et al., 2016 Neuron, Tomar et al., 2021 Front Behav Neurosci, and Asiminas et al., 2022 Molecular Autism), which was consistent with the value in this study.  Therefore, we don’t believe that fast gamma was only a minor influence of these neurons.

      ii. It is possible to infer a neuron's degree of oscillatory modulation without using the LFP. For instance, one can create an ISI histogram as done in Figure 1 here (https://www.biorxiv.org/content/10.1101/2021.09.20.461152v3.full.pdf+html; "Distinct ground state and activated state modes of firing in forebrain neurons"). The reciprocal of the ISI values would be "instantaneous spike frequency". In favor of the Douchamps et al. (2024) results, the figure of the BioRXiV paper implies that there is a single gamma frequency modulate as there is only a single bump in the ISIs in the 10^-1.5 to 10^-2 range. Therefore, to vet the slow gamma results and the premise of two gammas offered in the introduction, it would be worth including this analysis as part of Figure 2.

      By using suggested method, we calculated the ISI distribution on log scale for FG-cells and NFG-cells during behavior (Author response image 5).  We could observe that the ISI distribution of FG-cells had a bump in the 10<sup>-1.5</sup>= to 10<sup>-2</sup>= range (black bar), in particular in the fast gamma range (10<sup>-2</sup>= to 10<sup>-1.8</sup>=).

      Author response image 5.

      c. There are some things generally concerning about Figure 2.

      i. First, the raw trace does not seem to have clear theta epochs (it is challenging to ascertain the start and end of a theta cycle). Certainly, it would be worth highlighting the relationship between theta and the gammas and picking a nice theta epoch.

      We thank the review for this suggestion.  We've updated this figure with a nice theta epoch in the revised manuscript.

      ii. Also, in panel A, there looks to be a declining amplitude relationship between the raw, fast, and slow gamma traces, assuming that the scale bars represent 100uV in all three traces. The raw trace is significantly larger than the fast gamma. However, this relationship does not seem to be the case in panel B (in which both the raw and unfiltered examples of slow and fast gamma appear to be equal; the right panels of B suggest that fast gamma is larger than slow, appearing to contradict the A= 1/f organization of the power spectral density). Please explain as to why this occurs. Including the power spectral density (see below) should resolve some of this.

      We thank the review for pointing this out.  The scales of y-axis of LFPs tracs in Fig.2B was not consistent, which mislead the comparison of amplitude between slow and fast gamma.  We have unified y axis scales across different gamma types in the revised manuscript.  Moreover, we also have replaced these examples with more typical ones (also see the response below).

      iii. Within the example of spiking to phase in the left side of Panel B (fast gamma example)- the neuron appears to fire near the trough twice, near the peak twice, and somewhere in between once. A similar relationship is observed for the slow gamma epoch. One would conclude from these plots that the interaction of the neuron with the two rhythms is the same. However, the mean vector lengths and histograms below these plots suggest a different story in which the neuron is modulated by FG but not SG. Please reconcile this.

      We thank the review for pointing this out.  We found that the fast gamma phase locking was robust across FG-cells with fast gamma peak as the preferred phase.  Therefore, we have replaced these examples with more typical ones, so that the examples were consistent with the group effect.

      iv. For calculating the MVL, it seems that the number of spikes that the neuron fires would play a significant role. Working towards our next point, there may be a bias of finding a relationship if there are too few spikes (spurious clustering due to sparse data) and/or higher coupling values for higher firing rate cells (cells with higher firing rates will clearly show a relationship), forming a sort of inverse Yerkes-Dodson curve. Also, without understanding the magnitude of the MVL relative to other frequencies, it may be that these values are indeed larger than zero, but not biologically significant.

      - Please provide a scatter plot of Neuron MVL versus the Neuron's Firing Rate for 1) theta (7-9 Hz), 2) slow gamma, and 3) fast gamma, along with their line of best fit.

      - Please run a shuffle control where the LFP trace is shifted by random values between 125-1000ms and recalculate the MVL for theta, slow, and fast gamma. Often, these shuffle controls are done between 100-1000 times (see cross-correlation analyses of Fujisawa, Buzsaki et al.).

      - To establish that firing rate does not play a role in uncovering modulation, it would be worth conducting a spike number control, reducing the number of spikes per cell so that they are all equal before calculating the phase plots/MVL.

      We thank the review for raising this point.  Beside of the MVL value, we also calculated the pairwise phase consistency (PPC) as suggested by Reviewer2, which is not sensitive to the spike counts.  We found that the phase locking strength to either rhythm (theta or gamma) was comparable between MVL and PPC measurements (Author response image 2).  Moreover, we quantified the relationship between MVL and mean firing rate, as suggested.  We found that the MVL value for theta, slow gamma and fast gamma was negatively correlated with mean firing rate (Author response image 6, Pearson correlation, theta: R<sup>2</sup>= 0.06, Pearson’s r=-0.3, p=1.3×10<sup>-8</sup>=; slow gamma: R<sup>2</sup>= 0.1, Pearson’s r=-0.4, p=2.4×10<sup>-17</sup>=; fast gamma: R<sup>2</sup>= 0.03, Pearson’s r=-0.2, p=4.3×10<sup>-5</sup>=).  These results help us rule out the concerns of the effect of spikes counts on the phase modulation measurement.

      Author response image 6.

      (2) Something that I anticipated to see addressed in the manuscript was the study from Grosmark and Buzsaki (2016): "Cell assembly sequences during learning are "replayed" during hippocampal ripples and contribute to the consolidation of episodic memories. However, neuronal sequences may also reflect preexisting dynamics. We report that sequences of place-cell firing in a novel environment are formed from a combination of the contributions of a rigid, predominantly fast-firing subset of pyramidal neurons with low spatial specificity and limited change across sleep-experience-sleep and a slow-firing plastic subset. Slow-firing cells, rather than fast-firing cells, gained high place specificity during exploration, elevated their association with ripples, and showed increased bursting and temporal coactivation during postexperience sleep. Thus, slow- and fast-firing neurons, although forming a continuous distribution, have different coding and plastic properties."

      My concern is that much of the reported results in the present manuscript appear to recapitulate the observations of Grosmark and Buzsaki, but without accounting for differences in firing rate. A parsimonious alternative explanation for what is observed in the present manuscript is that high firing rate neurons, more integrated into the local network and orchestrating local gamma activity (PING), exhibit more coupling to theta and gamma. In this alternative perspective, it's not something special about how the neurons are entrained to the routed fast gamma, but that the higher firing rate neurons are better able to engage and entrain their local interneurons and, thus modulate local gamma. However, this interpretation challenges the discussion around the importance of fast gamma routed from the MEC.

      a. Please integrate the Grosmark & Buzsaki paper into the discussion.

      b. Also, please provide data that refutes or supports the alternative hypothesis in which the high firing rate cells are just more gamma modulated as they orchestrate local gamma activity through monosynaptic connections with local interneurons (e.g., Marshall et al., 2002, Hippocampal pyramidal cell-interneuron spike transmission is frequency dependent and responsible for place modulation of interneuron discharge). Otherwise, the attribution to a MEC routed fast gamma routing seems tenuous.

      c. It is mentioned that fast-spiking interneurons were removed from the analysis. It would be worth including these cells, calculating the MVL in 1 Hz increments as well as the reciprocal of their ISIs (described above).

      We thank the review for this suggestion.  Because we found the mean firing rate of FG-cells was higher than that of NFG-cells, it would be possible that the FG-cells are mainly overlapped with fast-firing cells (rigid cells) in Grosmark et al., 2016 Science.  Actually, in this study, we aimed to investigate how fast and slow gamma rhythms modulated neurons dynamically during learning, rather than defining new cell types.  Thus, we don’t think this work was just a replication of the previous publication.  We have added this description in the Discussion part (Lines 439-441).  In addition, we don’t have enough number of interneurons to support the analysis between interneurons and place cells.  Therefore, we couldn’t make any statement about where was the fast gamma originated (CA1 locally or routed from MEC) in this study.

      (3) Methods - Spectral decomposition and Theta Harmonics.

      a. It is challenging to interpret the exact parameters that the authors used for their multi-taper analysis in the methods (lines 516-526). Tallon-Baudry et al., (1997; Oscillatory γ-Band (30-70 Hz) Activity Induced by a Visual Search Task in Humans) discuss a time-frequency trade-off where frequency resolution changes with different temporal windows of analysis. This trade-off between time and frequency resolution is well known as the uncertainty principle of signal analysis, transcending all decomposition methods. It is not only a function of wavelet or FFT, and multi-tapers do not directly address this. (The multitaper method, by using multiple specially designed tapers -like the Slepian sequences- smooths the spectrum. This smoothing doesn't eliminate leakage but distributes its impact across multiple estimates). Given the brevity of methods and the issues of theta harmonics as offered above, it is worth including some benchmark trace testing for the multi-taper as part of the supplemental figures.

      i. Please spectrally decompose an asymmetric 8 Hz sawtooth wave showing the trace and the related power spectral density using the multiple taper method discussed in the methods.

      ii. Please also do the same for an elliptical oscillation (perfectly symmetrical waves, but also capable of casting harmonics). Matlab code on how to generate this time series is provided below:

      A = 1; % Amplitude

      T = 1/8; % Period corresponding to 8 Hz frequency

      omega = 2*pi/T; % Angular frequency

      C = 1; % Wave speed

      m = 0.9; % Modulus for the elliptic function (0<m<1 for cnoidal waves)

      x = linspace(0, 2*pi, 1000); % temporal domain

      t = 0; % Time instant

      % Calculate B based on frequency and speed

      B = sqrt(omega/C);

      % Cnoidal wave equation using the Jacobi elliptic function

      u = A .* ellipj(B.*(x - C*t), m).^2;

      % Plotting the cnoidal wave

      figure;

      plot(x./max(x), u);

      title('8 Hz Cnoidal Wave');

      xlabel('time (x)');

      ylabel('Wave amplitude (u)');

      grid on;

      The Symbolic Math Toolbox needs to be installed and accessible in your MATLAB environment to use ellipj. Otherwise, I trust that, rather than plotting a periodic orbit around a circle (sin wave) the authors can trace the movement around an ellipse with significant eccentricity (the distance between the two foci should be twice the distance between the co-vertices).

      We thank the review for this suggestion.  In the main text of manuscript, we only applied Morlet's wavelet method to calculate the time varying power of rhythms.  Multitaper method was used for the estimation of power spectra across running speeds, which was shown in the manuscript.  Therefore, we removed the description of Multitaper method and updated the Morlet's wavelet power spectral analysis in the Methods (Lines 541-544).

      As suggested, we estimated the power spectral densities of 8 Hz sawtooth and elliptical oscillation by using these methods, and compared them with the results from FFT.  We found that both the Multitaper's and Morlet's wavelet methods could well capture the 8Hz oscillatory components (Author response image 7).  However, we could observe harmonic components from FFT spectrum.

      Author response image 7.

      iii. Line 522: "The power spectra across running speeds and absolute power spectrum (both results were not shown).". Given the potential complications of multi-taper discussed above, and as each convolution further removes one from the raw data, it would be the most transparent, simple, and straightforward to provide power spectra using the simple fft.m code in Matlab (We imagine that the authors will agree that the results should be robust against different spectral decomposition methods. Otherwise, it is concerning that the results depend on the algorithm implemented and should be discussed. If gamma transience is a concern, the authors should trigger to 2-second epochs in which slow/fast gamma exceeds 3-7 std. dev. above the mean, comparing those resulting power spectra to 2-second epochs with ripples - also a transient event). The time series should be at least 2 seconds in length (to avoid spectral leakage issues and the issues discussed in Talon-Baudry et al., 1997 above).

      Please show the unmolested power spectra (Y-axis units in mV2/Hz, X-axis units as Hz) as a function of running speed (increments of 5 cm/s) for each animal. I imagine three of these PSDs for 3 of the animals will appear in supplemental methods while one will serve as a nice manuscript figure. With this plot, please highlight the regions that the authors are describing as theta, slow, and fast gamma. Also, any issues should be addressed should there be notable differences in power across animals or tetrodes (issues with locations along proximal-distal CA1 in terms of MEC/LEC input and using a local reference electrode are discussed below).

      As suggested, we firstly estimated the power spectra as a function of running speeds in each running lap, and showed them separately for each rat, by using the multitaper spectral analysis (Author response image 8).  In addition, to achieve unmolested power spectra, the short-time Fourier transform (STFT) was used for this analysis at the same frequency resolution (Author response image 9).  We could see that the power spectra were consistent between these two methods.  Notably, there seems no significant theta harmonic component in the slow gamma band range.

      The multitaper spectral analysis was performed as follows.  The power spectra were measured across different running speeds as described previously (Ahmed et al., 2012 J Neurosci; Zheng et al., 2015 Hippocampus; Zheng et al., 2016 eNeuro).  Briefly, the absolute power spectrum was calculated for 0.5s moving window and 0.2s step size of the LFPs recordings each lap, using the multitaper spectral analysis in the Chronux toolbox (Mitra and Bokil, 2008, http://chronux.org/) and STFT spectral analysis in Matlab script stft.m.  In the multitaper method, the time-bandwidth product parameter (TW) was set at 3, and the number of tapers (K) was set at 5.  In the STFT method, the FFT length was set at 2048, which was equivalent with the parameters used in multitaper method.  Running speed was calculated (see “Estimation of running speed and head direction” section in the manuscript) and averaged within each 0.5s time window corresponding to the LFP segments.  Then, the absolute power at each frequency was smoothed with a Gaussian kernel centered on given speed bin.  The power spectral as a function of running speed and frequency were plotted in log scale.  Also, the colormap was in log scale, allowing for comparisons across different frequencies that would otherwise be difficult due to the 1/f decay of power in physiological signals.

      Author response image 8.

      Author response image 9.

      iv. Schomberg and colleagues (2014) suggested that the modulation of neurons in the slow gamma range could be related to theta harmonics (see above). Harmonics can often extend in a near infinite as they regress into the 1/f background (contributing to power, but without a peak above the power spectral density slope), making arbitrary frequency limits inappropriate. Therefore, in order to support the analyses and assertions regarding slow gamma, it seems necessary to calculate a "theta harmonic/slow gamma ratio". Aru et al. (2015; Untangling cross-frequency coupling in neuroscience) offer that: " The presence of harmonics in the signal should be tested by a bicoherence analysis and its contribution to CFC should be discussed." Please test both the synthetic signals above and the raw LFP, using temporal windows of greater than 4 seconds (again, the large window optimizes for frequency resolution in the time-frequency trade-off) to calculate the bicoherence. As harmonics are integers of theta coupled to itself and slow gamma is also coupled to theta, a nice illustration and contribution to the field would be a method that uses the bispectrum to isolate and create a "slow gamma/harmonic" ratio.

      We thank the reviewer for providing the method regarding on the theta harmonics.  We firstly measured the theta harmonics on the synthesized signal by using the biphasic coherence method, and we could clearly observe the nonlinear coupling between theta rhythm and its harmonics (Author response image 10).

      Author response image 10.

      In addition, we also measured the bicoherence on raw traces during slow gamma episodes.  We did not see nonlinear coupling between slow gamma and theta bands in this real data (mean bicoherence=0.1±0.0002) compared with that in the synthesized signal (mean bicoherence=0.7 for elliptical waves and 0.5 for sawtooth waves), suggesting that the slow gamma detected in this study was not pure theta harmonic (Author response image 11C, F, I, in red boxes).  Therefore, we believe that the contribution of theta harmonic in slow gamma is not significant.

      Author response image 11.

      (4) I appreciate the inclusion of the histology for the 4 animals. Knerim and colleagues describe a difference in MEC projection along the proximal-distal axis of the CA1 region (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866456/)- "There are also differences in their direct projections along the transverse axis of CA1, as the LEC innervates the region of CA1 closer to the subiculum (distal CA1), whereas the MEC innervates the region of CA1 closer to CA2 and CA3 (proximal CA1)" From the histology, it looks like some of the electrodes are in the part of CA1 that would be dominated by LEC input while a few are closer to where the MEC would project.

      a. How do the authors control for these differences in projections? Wouldn't this change whether or not fast gamma is observed in CA1?

      b. I am only aware of one manuscript that describes slow gamma in the LEC which appeared in contrast to fast gamma from the MEC (https://www.science.org/doi/10.1126/science.abf3119). One would surmise that the authors in the present manuscript would have varying levels of fast gamma in their CA1 recordings depending on the location of the electrodes in the Proximal-distal axis, to the extent that some of the more medial tetrodes may need to be excluded (as they should not have fast gamma, rather they should be exclusively dominated by slow gamma). Alternatively, the authors may find that there is equal fast gamma power across the entire proximal-distal axis. However, this would pose a significant challenge to the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz et al. and require reconciliation/discussion.

      c. Is there a difference in neuron modulation to these frequencies based on electrode location in CA1?

      We thank the reviewer for this concern, which was also raised by Reviewer2.  We aligned the physical location of LFP channels in the proximal-distal axis based on histology.  In our dataset, only 2 rats were recorded from both distal and proximal hippocampus, so we calculated the gamma power from both sites in these rats.  We found that slow power was higher from proximal tetrodes than that from distal tetrodes (Author response image 12, repeated measure ANOVA, F(1,7)=10.2, p=0.02, partial η <sup>2</sup>=0.8).  However, fast gamma power were similar between different recording sites (F(1,7)=0.008, p=0.9, partial η <sup>2</sup>=0.001).  These results are partially consistent with the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz’s work.  The main reason would be that all LFPs were recorded from tetrodes in stratum pyramidale, deep layer in particular (Author response image 4E), so that it was hard to precisely identify their distance to distal/proximal apical dendrites.

      Author response image 12.

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      (5) Given a comment in the discussion (see below), it will be worth exploring changes in theta, theta harmonic, slow gamma, and fast gamma power with running speed as no changes were observed with theta sequences or lap number versus. Notably, Czurko et al., report an increase in theta and harmonic power with running speed (1999) while Ahmed and Mehta (2012) report a similar effect for gamma.

      a. Please determine if the oscillations change in power and frequency of the rhythms discussed above change with running speed using the same parameters applied in the present manuscript. The specific concern is that how the authors calculate running speed is not sensitive enough to evaluate changes.

      We thank the reviewer for this suggestion.  The description of running speed quantification has been updated in the Method (see “Estimation of running speed and head direction” section, Lines 501-511).  Overall, the sample frequency of running speed was25Hz which would be sensitive enough to evaluate the behavioral changes.

      By measuring the rhythmic power changing as a function of running speed (Author response image 8 and Author response image 9), we could observe that theta power was increased as running speed getting higher.  Consistent with the results in (Ahmed and Mehta, 2012) and our previous study (Zheng et al., 2015), the fast gamma power was increasing and slow gamma power was decreasing when running speed was getting high.

      In addition, we also estimated the rhythmic frequency as a function of running speed in the slow and fast episodes respectively.  We found that fast gamma frequency was increased with running speed (Author response image 13, linear regression, R<sup>2</sup>=0.4, corr=0.6, p=9.9×10<sup>-15</sup>), whereas slow gamma frequency was decreased with running speed (R<sup>2</sup>=0.2, corr=-0.4, p=8.8×10<sup>-6</sup>).  Although significant correlation was found between gamma frequency and running speed, consistent with the previous studies, the frequency change (~70-75Hz for fast gamma and ~30-28Hz for slow gamma) was not big enough to affect the sequence findings in this study.  In additiontheta frequency was maintained in either slow episodes (R<sup>2</sup>=0.02, corr=-0.1, p=0.1) or fast episodes (R<sup>2</sup>=0.004, corr=0.06, p=0.5), consistent with results in Fig.1G of Kropff et al., 2021 Neuron.

      Author response image 13.

      b. It is astounding that animals ran as fast as they did in what appears to be the first lap (Figure 3F), especially as rats' natural proclivity is thigmotaxis and inquisitive exploration in novel environments. Can the authors expand on why they believe their rats ran so quickly on the first lap in a novel environment and how to replicate this? Also, please include the individual values for each animal on the same plot.

      We thank the reviewer for pointing this out.  The task was not brand new to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, in terms exploration behaviors, the rats ran at relative high speeds across laps (Author response image 14, each gray line represents the running speed within an individual session).

      Author response image 14.

      c. Can the authors explain how the statistics on line 169 (F(4,44)) work? Specifically, it is challenging to determine how the degrees of freedom were calculated in this case and throughout if there were only 4 animals (reported in methods) over 5 laps (depicted in Figure 3F. Given line 439, it looks like trials and laps are used synonymously). Four animals over 5 laps should have a DOF of 16.

      This statistic result was performed with each session/day as a sample (n=12 sessions/days).  The statistics were generated by repeated measures ANOVA on 5 trials in 12 sessions, with a DOF of 44.

      (6) Throughout the manuscript, I am concerned about an inflation of statistical power. For example on line 162, F(2,4844). The large degrees of freedom indicate that the sample size was theta sequences or a number of cells. Since multiple observations were obtained from the same animal, the statistical assumption of independence is violated. Therefore, the stats need to be conducted using a nested model as described in Aarts et al. (2014; https://pubmed.ncbi.nlm.nih.gov/24671065/). A statistical consult may be warranted.

      We thank the reviewer for this suggestion.  We have replaced this statistic result by using generalized linear mixed model with ratID being a covariate.  These results have been updated in the revised manuscript (Lines 164-167).

      (7) It is stated that one tetrode served as a quiet recording reference. The "quiet" part is an assumption when often, theta and gamma can be volume conducted to the cortex (e.g., Sirota et al., 2008; This is often why laboratories that study hippocampal rhythms use the cerebellum for the differential recording electrode and not an electrode in the corpus callosum). Generally, high frequencies propagate as well as low frequencies in the extracellular milieu (https://www.eneuro.org/content/4/1/ENEURO.0291-16.2016). For transparency, the authors should include a limitation paragraph in their discussion that describes how their local tetrode reference may be inadvertently diminishing and/or distorting the signal that they are trying to isolate. Otherwise, it would be worth hearing an explanation as to how the author's approach avoids this issue.

      In terms of the locations of references, we had 2 screws above the cerebellum in the skull connected to the recording drive ground, and 1 tetrode in a quiet area of the cortex serving as the recording reference.  We agree that the theta and gamma can be volume conducted to the cortex which may affect the power of these rhythms in the stratum pyramidale.  However, we didn’t mean to measure or compare the absolute theta or gamma power in this study, as we only cared about the phase modulation of gamma to place cells.  Therefore, we believe the location of recording reference would not make significant effect on our conclusion.

      Apologetically, this review is already getting long. Moreover, I have substantial concerns that should be resolved prior to delving into the remainder of the analyses. e.g., the analyses related to Figure 3-5 assert that FG cells are important for sequences. However, the relationship to gamma may be secondary to either their relationship to theta or, based on the Grosmark and Buzsaki paper, it may just be a phenomenon coupled to the fast-firing cells (fast-firing cells showing higher gamma modulation due to a local PING dynamic). Moreover, the observation of slow gamma is being challenged as theta harmonics, even by the major proponents of the slow/fast gamma theory. Therefore, the report of slow gamma precession would come as an unsurprising extension should they be revealed to be theta harmonics (however, no control for harmonics was implemented; suggestions were made above). Following these amendments, I would be grateful for the opportunity to provide further feedback.

      III. Discussion.

      a. Line 330- it was offered that fast gamma encodes information while slow gamma integrates in the introduction. However, in a task such as circular track running (from the methods, it appears that there is no new information to be acquired within a trial), one would guess that after the first few laps, slow gamma would be the dominant rhythm. Therefore, one must wonder why there are so few neurons modulated by slow gamma (~3.7%).

      The proportion of ~3.7% was the part of place cells phase-locked to slow gamma.  However, we aimed to find that the slow gamma phase precession of place cells promoted the theta sequence development.  We would not expect the cells phase-locked to slow gamma if phase precession occurred.

      b. Line 375: The authors contend that: "...slow gamma, related to information compression, was also required to modulate fast gamma phase-locked cells during sequence development. We replicated the results of slow gamma phase precession at the ensemble level (Zheng et al., 2016), and furthermore observed it at late development, but not early development, of theta sequences." In relation to the idea that slow gamma may be coupled to - if not a distorted representation of - theta harmonics, it has been observed that there are changes in theta relative to novelty.

      i. A. Jeewajee, C. Lever, S. Burton, J. O'Keefe, and N. Burgess (2008) report a decrease in theta frequency in novel circumstances that disappears with increasing familiarity.

      ii. One could surmise that this change in frequency is associated with alterations in theta harmonics (observed here as slow gamma), challenging the author's interpretation.

      iii. Therefore, the authors have a compelling opportunity to replicate the results of Jeewajee et al., characterizing changes of theta along with the development of slow gamma precession, as the environment becomes familiar. It will become important to demonstrate, using bicoherence as offered by Aru et al., how slow gamma can be disambiguated from theta harmonics. Specifically, we anticipate that the authors will be able to quantify A) theta harmonics (the number, and their respective frequencies and amplitudes), B) the frequency and amplitude of slow gamma, and C) how they can be quantitatively decoupled. Through this, their discussion of oscillatory changes with novelty-familiarity will garner a significant impact.

      We think we have demonstrated that the slow gamma observed in this study was not purely theta harmonics.  We didn’t focus on the frequency change of slow gamma or theta rhythms in this study.  Further investigation will be carried out on this topic in the future.

      c. Broadly, it is interesting that the authors emphasize the gamma frequency throughout the discussion. Given that the power spectral density of the Local Field Potential (LFP) exhibits a log-log relationship between amplitude and frequency, as described by Buzsáki (2005) in "Rhythms of the Brain," and considering that the LFP is primarily generated through synaptic transmembrane currents (Buzsáki et al., 2012), it seems parsimonious to consider that the bulk of synaptic activity occurs at lower frequencies (e.g., theta). Since synaptic transmission represents the most direct form of inter-regional communication, one might wonder why gamma (characterized by lower amplitude rhythms) is esteemed so highly compared to the higher amplitude theta rhythm. Why isn't the theta rhythm, instead, regarded as the primary mode of communication across brain regions? A discussion exploring this question would be beneficial.

      We thank the reviewer for this deep thinking.  When stating the conclusion on gamma rhythms, we didn’t mean to weaken the role of theta rhythm.  Conversely, the fast or slow gamma episodes were detected riding on theta rhythms, and we believe that the information compression should occur at a finer scale within a theta cycle scale.  More investigation will be carried out on this topic in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is helpful to clearly define "FG-cell sequences" before the relevant results are described in the Results section. More importantly, the seemingly conflicting results between Figure 3 and Figure 8 may need to be clarified.

      The “exFG-sequences and exNFG sequences”, “FG-cell sequences and NFG-cell sequences” have been defined clearly in the revised manuscript.  Moreover, the seemingly conflicting results between Figure 3 and Figure 8 have been interpreted properly.

      (2) It is helpful to clearly state the N and what defines a sample whenever a result is described.

      In each statistical results, the N and what defines a sample have been clarified in the revised manuscript.

      (3) Addressing the questions regarding the methods (#5) would clarify some of the results.

      The questions regarding the Methods part has addressed in the revised manuscript.

      (4) Line #244: "successful" should be "successive"?

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      - The writing of the manuscript can be substantially improved.

      The manuscript can be substantially revised and updated.

      - I noticed that the last author of the manuscript is not the lead or corresponding and has only provided a limited contribution to this work (according to the detailed author contributions). The second to last author seems to be the main senior intellectual contributor and supervisor, together with the third to last author. This speaks of potential bad academic practices where a senior person whose intellectual contribution to the study is relatively minor takes the last author position, against the standard conventions on authorship worldwide. I strongly suggest that this is corrected.

      We thank the reviewer for raising this problem.  The last author Dr. Ming was also a senior author and supervised this project with large contribution.  We have fixed his role as a co-corresponding author in the revised manuscript.

    1. eLife Assessment

      The Twin Domain model proposed by Lui and Wang proposing that twin supercoiling domains of DNA emerge during transcription were first described decades ago, but direct experimental evidence has been challenging to obtain. Here, the authors make a fundamental contribution by directly measuring DNA torsion in cells using a photoactivatable intrastrand cross-linker compared to controls. They gather compelling data using this clever method, which provides direct evidence in support of the twin-supercoiled domain model, for torsional effects at transcription start and end sites, and thereby uncover novel features of higher order structure of chromatin in yeast. These data are exciting, and the tools will be of interest to anyone studying chromosome structure and gene regulation.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hall et al reports a genome-wide map of supercoiling in yeast using psoralen as a probe that intercalates more effectively into underwound DNA and can then be fixed in place by UV-cross-linking. Sites of cross-linking are revealed by exonuclease digestion and sequencing. Cross-linking is compared with samples that are first fixed with formaldehyde, permeabilized, digested with Dpn II to release unrestrained torsion, and then crosslinked. The authors promote this "zero-torsion" approach as an improvement that corrects for nucleosomes (or binding by other macromolecules) that mask psoralen binding. The investigators then examine patterns of psoralen binding (and hence supercoiling) that are associated with promoter strength, promoter type (sequence-specific transcription factor dependent, insulator associated, or general TFs only) and gene length.

      Strengths:

      This is an interesting paper that reports an approach that reveals some new information about the relationship between torsional stress and gene activity in the yeast genome. The method is logical and interesting and provides evidence that spread of torsional stress through the genome is regulated.

      Weaknesses:

      The analysis is not entirely novel, and I believe that more valuable information can be culled from these datasets than is reported here.

    3. Reviewer #2 (Public review):

      Summary:

      This study describes a novel method for mapping torsional stress in the genome of Saccharomyces cerevisiae using trimethylpsoralen (TMP). It introduces a procedure to establish a zero-torsion baseline while preserving the chromatin state by treating cells with formaldehyde before releasing torsion with restriction enzyme digestion.

      This approach allows foer more accurate differentiation between torsional stress effects and accessibility effects in the psoralen signal. The results confirm that psoralen crosslinking is strongly affected by accessibility of the DNA and to a much more limited extent by the torsional stress of the DNA. Subtracting the baseline signal (no torsion) from the total signal allows detecting torsional stress, although TMP accessibility is still affecting the read out. The authors confirm the validity of the method by studying torsional stress in dependence of transcription levels, gene length and relative gene orientation. They propose that torsional stress may play a role in recruiting topoisomerases and regulating 3D genome architecture via cohesin. They also suggest that transcription factor binding might insulate negative supercoiling originated form transcription of neighboring divergent genes.

      Strengths:

      This paper offers a potentially interesting tool for future work.

      Weaknesses:

      The signal-to-background ratio, which represents the torsional fraction, appears to be quite limited relative to the overall signal (roughly 20x less, according to the scales in figs 2a and 2b, raising concerns about the robustness of the conclusions. It is clear from these figures, for instance, that a non-negligible fraction of the remaining signal is still dependent on DNA accessibility, revealing the nucleosomes footprints in spite of the fact that subtracting the zero-torsion signal should theoretically hinder the accessibility component. Because of this, some of the conclusions might be flawed, in that what is attributed to torsional stress might in reality be due, partially or fully, to accessibility issues.

      Specific points:

      Lines 226-227: "rotation may be more restricted with a lengthening in the RNA transcript, which is known to be associated with large machinery, such as spliceosomes". This argument is not appropriate to correlate torsional stress with gene length. Spliced genes are rare and generally short in yeast, generally in ribosomal proteins genes.

      Lines 256-257 In discussing that torsional stress must hinder Pol II progression, the authors write: "Pol II has a minimal presence in the intergenic region between divergent genes and is enriched in the intergenic region between convergent genes, consistent with a previous finding that after termination, Pol II tends to remain on the DNA downstream of the terminator". The connection between Pol II distribution and torsional stress is unclear. Pol ii is depleted at promoters and is enriched at at 3'-end of convergent genes most likely because this ChIP signal is the sum of signals from the two convergent genes. The fact that positive torsional stress is observed in these region does not mean that polymerases accumulate because the torsional stress hinder Pol II progression. To claim elongation defects the authors should repeat the same analysis with stranded data (e.g. NET-seq or CRAC) and assess if polymerases transcribing these regions accumulate more when facing convergent genes compared to tandem genes. The claim that after termination the Pol II tends to remain on the DNA appears to be meaningless - the authors probably mean after RNA processing.

      Lines 275-277: "These data provide evidence that the (+) supercoiling generated by transcription may facilitate genome folding in coordination with other participating proteins". This is an overstatement. It is known that cohesins accumulate between convergent genes. The fact that there is torsional stress in the same position does not imply that supercoiling participates in genome folding. These could be independent events, or even, supercoiling might depend on cohesins

      Lines 289-290 "torsion generated from one gene can impact the expression of its neighboring gene, consistent with previous findings that the expression of these genes is coupled" the existence of negative torsional stress in a common intergenic region for two genes does not imply that torsion is causally associated to gene expression coupling

      Lines 291-292: "Another large class of S. cerevisiae promoters (termed "TFO") are regulated by insulator ssTFs, such as Reb1 and Abf1, which decouple interactions between neighbouring genes" In these cases and others that depend on an activator binding the authors detect a region of accessibility interrupted by a valley, which they interpret as a topological insulator. However, the valley might be generated because of decreased TMP accessibility due to of TF binding.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe a new method for measuring DNA torsion in cells using the photoactivatable intrastrand cross-linker trimethyl psoralen (TMP). However, their method differs from previous TMP-based torsion mapping methods by comparing formaldehyde cross-linked and torsionally trapped chromatin to torsion-relieved (zero-torsion) chromatin in parallel. Comparison between the two datasets reveals a very slight difference, but enough to provide extremely high resolution genome-wide maps of torsion in the yeast genome. This direct comparison of the two maps confirms that blockage of TMP binding by nucleosomes and some DNA-binding proteins from TMP intercalation is a major complication of previous methods, and analysis of the data provides a glimpse of chromatin-based processes from within the DNA gyre.

      Strengths:

      In addition to providing direct evidence for the twin-supercoiled domain model and for torsional effects at transcription start (TSS) and end (TES) sites, the authors' analyses reveal some novel features of yeast higher-order structure. These include the cohesin-dependent anchoring of DNA loops at sites of positive supercoiling and the insulation of torsion between closely spaced divergent genes by general transcription factors, which implies that these factors resist free rotation. The fact that method should be generalizable to complex eukaryotic cells with large genomes, and the implications for understanding how torsion impacts transcription and gene regulation will be of substantial interest to a broad community.

      Weaknesses:

      No serious weaknesses.

    1. eLife Assessment

      This useful paper uses a quantitative modeling approach to explore a putative mechanism underlying a well-studied behavioral transition in the nematode C. elegans. The premise, that what has been considered a two-state behavior can instead be described as a process whose parameters are smoothly modulated within a single state, is intriguing. However, in the paper's current state, concerns about the model and its fit to empirical data make the support for this idea inadequate.

    2. Reviewer #1 (Public review):

      Summary:

      This paper concerns mechanisms of foraging behavior in C. elegans. Upon removal from food, C. elegans first executes a stereotypical local search behavior in which it explores a small area by executing many random, undirected reversals and turns called "reorientations." If the worm fails to find food, it transitions to a global search in which it explores larger areas by suppressing reorientations and executing long forward runs (Hills et al., 2004). At the population level, the reorientation rate declines gradually. Nevertheless, about 50% of individual worms appear to exhibit an abrupt transition between local and global search, which is evident as a discrete transition from high to low reorientation rate (Lopez-Cruz et al., 2019). This observation has given rise to the hypothesis that local and global search correspond to separate internal states with the possibility of sudden transitions between them (Calhoun et al., 2014). The main conclusion of the paper is that it is not necessary to posit distinct internal states to account for discrete transitions from high to low reorientation rates. On the contrary, discrete transitions can occur simply because of the stochastic nature of the reorientation behavior itself.

      Strengths:

      The strength of the paper is the demonstration that a more parsimonious model explains abrupt transitions in the reorientation rate.

      Weaknesses:

      (1) Use of the Gillespie algorithm is not well justified. A conventional model with a fixed dt and an exponentially decaying reorientation rate would be adequate and far easier to explain. It would also be sufficiently accurate - given the appropriate choice of dt - to support the main claims of the paper, which are merely qualitative. In some respects, the whole point of the paper - that discrete transitions are an epiphenomenon of stochastic behavior - can be made with the authors' version of the model having a constant reorientation rate (Figure 2f).

      (2) In the manuscript, the Gillespie algorithm is very poorly explained, even for readers who already understand the algorithm; for those who do not it will be essentially impossible to comprehend. To take just a few examples: in Equation (1), omega is defined as reorientations instead of cumulative reorientations; it is unclear how (4) follows from (2) and (3); notation in (5), line 133, and (7) is idiosyncratic. Figure 1a does not help, partly because the notation is unexplained. For example, what do the arrows mean, what does "*" mean?

      (3) In the model, the reorientation rate dΩ⁄dt declines to zero but the empirical rate clearly does not. This is a major flaw. It would have been easy to fix by adding a constant to the exponentially declining rate in (1). Perhaps fixing this obvious problem would mitigate the discrepancies between the data and the model in Figure 2d.

      (4) Evidence that the model fits the data (Figure 2d) is unconvincing. I would like to have seen the proportion of runs in which the model generated one as opposed to multiple or no transitions in reorientation rate; in the real data, the proportion is 50% (Lopez). It is claimed that the "model demonstrated a continuum of switching to non-switching behavior" as seen in the experimental data but no evidence is provided.

      (5) The explanation for the poor fit between the model and data (lines 166-174) is unclear. Why would externally triggered collisions cause a shift in the transition distribution?

      (6) The discussion of Levy walks and the accompanying figure are off-topic and should be deleted.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors build a statistical model that stochastically samples from a time-interval distribution of reorientation rates. The form of the distribution is extracted from a large array of behavioral data, and is then used to describe not only the dynamics of individual worms (including the inter-individual variability in behavior), but also the aggregate population behavior. The authors note that the model does not require assumptions about behavioral state transitions, or evidence accumulation, as has been done previously, but rather that the stochastic nature of behavior is "simply the product of stochastic sampling from an exponential function".

      Strengths:

      This model provides a strong juxtaposition to other foraging models in the worm. Rather than evoking a behavioral transition function (that might arise from a change in internal state or the activity of a cell type in the network), or evidence accumulation (which again maps onto a cell type, or the activity of a network) - this model explains behavior via the stochastic sampling of a function of an exponential decay. The underlying model and the dynamics being simulated, as well as the process of stochastic sampling, are well described and the model fits the exponential function (Equation 1) to data on a large array of worms exhibiting diverse behaviors (1600+ worms from Lopez-Cruz et al). The work of this study is able to explain or describe the inter-individual diversity of worm behavior across a large population. The model is also able to capture two aspects of the reorientations, including the dynamics (to switch or not to switch) and the kinetics (slow vs fast reorientations). The authors also work to compare their model to a few others including the Levy walk (whose construction arises from a Markov process) to a simple exponential distribution, all of which have been used to study foraging and search behaviors.

      Weaknesses:

      This manuscript has two weaknesses that dampen the enthusiasm for the results. First, in all of the examples the authors cite where a Gillespie algorithm is used to sample from a distribution, be it the kinetics associated with chemical dynamics, or a Lotka-Volterra Competition Model, there are underlying processes that govern the evolution of the dynamics, and thus the sampling from distributions. In one of their references, for instance, the stochasticity arises from the birth and death rates, thereby influencing the genetic drift in the model. In these examples, the process governing the dynamics (and thus generating the distributions from which one samples) is distinct from the behavior being studied. In this manuscript, the distribution being sampled is the exponential decay function of the reorientation rate (lines 100-102). This appears to be tautological - a decay function fitted to the reorientation data is then sampled to generate the distributions of the reorientation data. That the model performs well and matches the data is commendable, but it is unclear how that could not be the case if the underlying function generating the distribution was fit to the data.

      The second weakness is somewhat related to the first, in that absent an underlying mechanism or framework, one is left wondering what insight the model provides. Stochastic sampling a function generated by fitting the data to produce stochastic behavior is where one ends up in this framework, and the authors indeed point this out: "simple stochastic models should be sufficient to explain observably stochastic behaviors." (Line 233-234). But if that is the case, what do we learn about how the foraging is happening? The authors suggest that the decay parameter M can be considered a memory timescale; which offers some suggestion, but then go on to say that the "physical basis of M can come from multiple sources". Here is where one is left for want: The mechanisms suggested, including loss of sensory stimuli, alternations in motor integration, ionotropic glutamate signaling, dopamine, and neuropeptides are all suggested: these are basically all of the possible biological sources that can govern behavior, and one is left not knowing what insight the model provides. The array of biological processes listed is so variable in dynamics and meaning, that their explanation of what governs M is at best unsatisfying. Molecular dynamics models that generate distributions can point to certain properties of the model, such as the binding kinetics (on and off rates, etc.) as explanations for the mechanisms generating the distributions, and therefore point to how a change in the biology affects the stochasticity of the process. It is unclear how this model provides such a connection, especially taken in aggregate with the previous weakness.

      Providing a roadmap of how to think about the processes generating M, the meaning of those processes in search, and potential frameworks that are more constrained and with more precise biological underpinning (beyond the array of possibilities described) would go a long way to assuaging the weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      This intriguing paper addresses a special case of a fundamental statistical question: how to distinguish between stochastic point processes that derive from a single "state" (or single process) and more than one state/process. In the language of the paper, a "state" (perhaps more intuitively called a strategy/process) refers to a set of rules that determine the temporal statistics of the system. The rules give rise to probability distributions (here, the probability for turning events). The difficulty arises when the sampling time is finite, and hence, the empirical data is finite, and affected by the sampling of the underlying distribution(s). The specific problem being tackled is the foraging behavior of C. elegans nematodes, removed from food. Such foraging has been studied for decades, and described by a transition over time from 'local'/'area-restricted' search'(roughly in the initial 10-30 minutes of the experiments, in which animals execute frequent turns) to 'dispersion', or 'global search' (characterized by a low frequency of turns). The authors propose an alternative to this two-state description - a potentially more parsimonious single 'state' with time-changing parameters, which they claim can account for the full-time course of these observations.

      Figure 1a shows the mean rate of turning events as a function of time (averaged across the population). Here, we see a rapid transient, followed by a gradual 4-5 fold decay in the rate, and then levels off. This picture seems consistent with the two-state description. However, the authors demonstrate that individual animals exhibit different "transition" statistics (Figure 1e) and wish to explain this. They do so by fitting this mean with a single function (Equations 1-3).

      Strengths:

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Weaknesses:

      (1) The authors claim that only about half the animals tested exhibit discontinuity in turning rates. Can they automatically separate the empirical and model population into these two subpopulations (with the same method), and compare the results?

      (2) The equations consider an exponentially decaying rate of turning events. If so, Figure 2b should be shown on a semi-logarithmic scale.

      (3) The variables in Equations 1-3 and the methods for simulating them are not well defined, making the method difficult to follow. Assuming my reading is correct, Omega should be defined as the cumulative number of turning events over time (Omega(t)), not as a "turn" or "reorientation", which has no derivative. The relevant entity in Figure 1a is apparently , i.e. the mean number of events across a population which can be modelled by an expectation value. The time derivative would then give the expected rate of turning events as a function of time.

      (4) Equations 1-3 are cryptic. The authors need to spell out up front that they are using a pair of coupled stochastic processes, sampling a hidden state M (to model the dynamic turning rate) and the actual turn events, Omega(t), separately, as described in Figure 2a. In this case, the model no longer appears more parsimonious than the original 2-state model. What then is its benefit or explanatory power (especially since the process involving M is not observable experimentally)?

      (5) Further, as currently stated in the paper, Equations 1-3 are only for the mean rate of events. However, the expectation value is not a complete description of a stochastic system. Instead, the authors need to formulate the equations for the probability of events, from which they can extract any moment (they write something in Figure 2a, but the notation there is unclear, and this needs to be incorporated here).

      (6) Equations 1-3 have three constants (alpha and gamma which were fit to the data, and M0 which was presumably set to 1000). How does the choice of M0 affect the results?

      (7) M decays to near 0 over 40 minutes, abolishing omega turns by the end of the simulations. Are omega turns entirely abolished in worms after 30-40 minutes off food? How do the authors reconcile this decay with the leveling of the turning rate in Figure 1a?

      (8) The fit given in Figure 2b does not look convincing. No statistical test was used to compare the two functions (empirical and fit). No error bars were given (to either). These should be added. In the discussion, the authors explain the discrepancy away as experimental limitations. This is not unreasonable, but on the flip side, makes the argument inconclusive. If the authors could model and simulate these limitations, and show that they account for the discrepancies with the data, the model would be much more compelling. To do this, I would imagine that the authors would need to take the output of their model (lists of turning times) and convert them into simulated trajectories over time. These trajectories could be used to detect boundary events (for a given size of arena), collisions between individuals, etc. in their simulations and to see their effects on the turn statistics.

      (9) The other figures similarly lack any statistical tests and by eye, they do not look convincing. The exception is the 6 anecdotal examples in Figure 2e. Those anecdotal examples match remarkably closely, almost suspiciously so. I'm not sure I understood this though - the caption refers to "different" models of M decay (and at least one of the 6 examples clearly shows a much shallower exponential). If different M models are allowed for each animal, this is no longer parsimonious. Are the results in Figure 2d for a single M model? Can Figure 2e explain the data with a single (stochastic) M model?

      (10) The left axes of Figure 2e should be reverted to cumulative counts (without the normalization).

      (11) The authors give an alternative model of a Levy flight, but do not give the obvious alternative models:<br /> a) the 1-state model in which P(t) = alpha exp (-gamma t) dt (i.e. a single stochastic process, without a hidden M, collapsing equations 1-3 into a single equation).<br /> b) the originally proposed 2-state model (with 3 parameters, a high turn rate, a low turn rate, and the local-to-global search transition time, which can be taken from the data, or sampled from the empirical probability distributions). Why not? The former seems necessary to justify the more complicated 2-process model, and the latter seems necessary since it's the model they are trying to replace. Including these two controls would allow them to compare the number of free parameters as well as the model results. I am also surprised by the Levy model since Levy is a family of models. How were the parameters of the Levy walk chosen?

      (12) One point that is entirely missing in the discussion is the individuality of worms. It is by now well known that individual animals have individual behaviors. Some are slow/fast, and similarly, their turn rates vary. This makes this problem even harder. Combined with the tiny number of events concerned (typically 20-40 per experiment), it seems daunting to determine the underlying model from behavioral statistics alone.

      (13) That said, it's well-known which neurons underpin the suppression of turning events (starting already with Gray et al 2005, which, strangely, was not cited here). Some discussion of the neuronal predictions for each of the two (or more) models would be appropriate.

      (14) An additional point is the reliance entirely on simulations. A rigorous formulation (of the probability distribution rather than just the mean) should be analytically tractable (at least for the first moment, and possibly higher moments). If higher moments are not obtainable analytically, then the equations should be numerically integrable. It seems strange not to do this.

      In summary, while sample simulations do nicely match the examples in the data (of discontinuous vs continuous turning rates), this is not sufficient to demonstrate that the transition from ARS to dispersion in C. elegans is, in fact, likely to be a single 'state', or this (eq 1-3) single state. Of course, the model can be made more complicated to better match the data, but the approach of the authors, seeking an elegant and parsimonious model, is in principle valid, i.e. avoiding a many-parameter model-fitting exercise.

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper concerns mechanisms of foraging behavior in C. elegans. Upon removal from food, C. elegans first executes a stereotypical local search behavior in which it explores a small area by executing many random, undirected reversals and turns called "reorientations." If the worm fails to find food, it transitions to a global search in which it explores larger areas by suppressing reorientations and executing long forward runs (Hills et al., 2004). At the population level, the reorientation rate declines gradually. Nevertheless, about 50% of individual worms appear to exhibit an abrupt transition between local and global search, which is evident as a discrete transition from high to low reorientation rate (Lopez-Cruz et al., 2019). This observation has given rise to the hypothesis that local and global search correspond to separate internal states with the possibility of sudden transitions between them (Calhoun et al., 2014). The main conclusion of the paper is that it is not necessary to posit distinct internal states to account for discrete transitions from high to low reorientation rates. On the contrary, discrete transitions can occur simply because of the stochastic nature of the reorientation behavior itself.

      Strengths:

      The strength of the paper is the demonstration that a more parsimonious model explains abrupt transitions in the reorientation rate.

      Weaknesses:

      (1) Use of the Gillespie algorithm is not well justified. A conventional model with a fixed dt and an exponentially decaying reorientation rate would be adequate and far easier to explain. It would also be sufficiently accurate - given the appropriate choice of dt - to support the main claims of the paper, which are merely qualitative. In some respects, the whole point of the paper - that discrete transitions are an epiphenomenon of stochastic behavior - can be made with the authors' version of the model having a constant reorientation rate (Figure 2f).

      We apologize, but we are not sure what the reviewer means by “fixed dt”. If the reviewer means taking discrete steps in time (dt), and modeling whether a reorientation occurs, we would argue that the Gillespie algorithm is a better way to do this because it provides floating-point precision time resolution, rather than a time resolution limited by dt, which we hopefully explain in the comments below.

      The reviewer is correct that discrete transitions are an epiphenomenon of stochastic behavior as we show in Figure 2f. However, abrupt stochastic jumps that occur with a constant rate do not produce persistent changes in the observed rate because it is by definition, constant. The theory that there are local and global searches is based on the observation that individual worms often abruptly change their rates. But this observation is only true for a fraction of worms. We are trying to argue that the reason why this is not observed for all, or even most worms is because these are the result of stochastic sampling, not a sudden change in search strategy.

      (2) In the manuscript, the Gillespie algorithm is very poorly explained, even for readers who already understand the algorithm; for those who do not it will be essentially impossible to comprehend. To take just a few examples: in Equation (1), omega is defined as reorientations instead of cumulative reorientations; it is unclear how (4) follows from (2) and (3); notation in (5), line 133, and (7) is idiosyncratic. Figure 1a does not help, partly because the notation is unexplained. For example, what do the arrows mean, what does "*" mean?

      We apologize for this, you are correct,  is cumulative reorientations, and we will edit the text as follows:

      Experimentally, reorientation rate is measured as the number of reorientation events that occurred in an observational window. However, these are discrete stochastic events, so we should describe them in terms of propensity, i.e. the probability of observing a transitional event (in this case, a reorientation) is:

      Here, P(W+1,t) is the probability of observing a reorientation event at time t, and a<sub>1</sub> is the propensity for this event to occur. Observationally, the frequency of reorientations observed decays over time, so we can define the propensity as:

      Where α is the initial propensity at t=0.

      We can model this decay as the reorientation propensity coupled to a decaying factor (M):

      Where the propensity of this event (a<sub>2</sub>) is:

      Since M is a first-order decay process, when integrated, the cumulative M observed is:

      We can couple the probability of observing a reorientation to this decay by redefining (a<sub>1</sub> as:

      So that now:

      A critical detail should be noted. While reorientations are modeled as discrete events, the amount of M at time t\=0 is chosen to be large (M<sub>0</sub>←1,000), so that over the timescale of 40 minutes, the decay in M is practically continuous. This ensures that sudden changes in reorientations are not due to sudden changes in M, but due to the inherent stochasticity of reorientations.

      To model both processes, we can create the master equation:

      Since these are both Poisson processes, the probability density function for a state change i occurring in time t is:

      The probability that an event will not occur in time interval t is:

      The probability that no events will occur for ALL transitions in this time interval is:

      We can draw a random number (r<sub>1</sub> ∈[0,1]) that represents the probability of no events in time interval t, so that this time interval can be assigned by rearranging equation 11:

      where:

      This is the time interval for any event (W+1 or M-1) happening at t + t. The probability of which event occurs is proportional to its propensity:

      We can draw a second number (r<sub>2</sub> ∈[0,1]) that represents this probability so that which event occurs at time t + t is determined by the smallest n that satisfies:

      so that:

      The elegant efficiency of the Gillespie algorithm is two-fold. First, it models all transitions simultaneously, not separately. Second, it provides floating-point time resolution. Rather than drawing a random number, and using a cumulative probability distribution of interval-times to decide whether an event occurs at discrete steps in time, the Gillespie algorithm uses this distribution to draw the interval-time itself. The time resolution of the prior approach is limited by step size, whereas the Gillespie algorithm’s time resolution is limited by the floating-point precision of the random number that is drawn.

      We are happy to add this text to improve clarity.

      We apologize for the arrow notation confusion. Arrow notation is commonly used in pseudocode to indicate variable assignment, and so we used it to indicate variable assignment updates in the algorithm.

      We added Figure 2a to help explain the Gillespie algorithm for people who are unfamiliar with it, but you are correct, some notation, like probabilities, were left unexplained. We will address this to improve clarity.

      (3) In the model, the reorientation rate dΩ⁄dt declines to zero but the empirical rate clearly does not. This is a major flaw. It would have been easy to fix by adding a constant to the exponentially declining rate in (1). Perhaps fixing this obvious problem would mitigate the discrepancies between the data and the model in Figure 2d.

      You are correct that the model deviates slightly at longer times, but this result is consistent with Klein et al. that show a continuous decline of reorientations. However, we could add a constant to the model, since an infinite run length is likely not physiological.

      (4) Evidence that the model fits the data (Figure 2d) is unconvincing. I would like to have seen the proportion of runs in which the model generated one as opposed to multiple or no transitions in reorientation rate; in the real data, the proportion is 50% (Lopez). It is claimed that the "model demonstrated a continuum of switching to non-switching behavior" as seen in the experimental data but no evidence is provided.

      We should clarify that the 50% proportion cited by López-Cruz was based on an arbitrary difference in slopes, and by assessing the data visually. We sought to avoid this subjective assessment by plotting the distribution of slopes and transition times produced by the method used in López-Cruz. We should also clarify by what we meant by “a continuum of switching and non-switching” behavior. Both the transition time distributions and the slope-difference distributions do not appear to be the result of two distributions. This is unlike roaming and dwelling on food, where two distinct distributions of behavioral metrics can be identified based on speed and angular speed (Flavell et al, 2009, Fig S2a). We will add a permutation test to verify the mean differences in slopes and transition times between the experiment and model are not significant.

      (5) The explanation for the poor fit between the model and data (lines 166-174) is unclear. Why would externally triggered collisions cause a shift in the transition distribution?

      Thank you, we should rewrite the text to clarify this better. There were no externally triggered collisions; 10 animals were used per experiment. They would occasionally collide during the experiment, but these collisions were excluded from the data that were provided. However, worms are also known to increase reorientations when they encounter a pheromone trail, and it is unknown (from this dataset) which orientations may have been a result of this phenomenon.

      (6) The discussion of Levy walks and the accompanying figure are off-topic and should be deleted.

      Thank you, we agree that this topic is tangential, and we will remove it.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors build a statistical model that stochastically samples from a time-interval distribution of reorientation rates. The form of the distribution is extracted from a large array of behavioral data, and is then used to describe not only the dynamics of individual worms (including the inter-individual variability in behavior), but also the aggregate population behavior. The authors note that the model does not require assumptions about behavioral state transitions, or evidence accumulation, as has been done previously, but rather that the stochastic nature of behavior is "simply the product of stochastic sampling from an exponential function".

      Strengths:

      This model provides a strong juxtaposition to other foraging models in the worm. Rather than evoking a behavioral transition function (that might arise from a change in internal state or the activity of a cell type in the network), or evidence accumulation (which again maps onto a cell type, or the activity of a network) - this model explains behavior via the stochastic sampling of a function of an exponential decay. The underlying model and the dynamics being simulated, as well as the process of stochastic sampling, are well described and the model fits the exponential function (Equation 1) to data on a large array of worms exhibiting diverse behaviors (1600+ worms from Lopez-Cruz et al). The work of this study is able to explain or describe the inter-individual diversity of worm behavior across a large population. The model is also able to capture two aspects of the reorientations, including the dynamics (to switch or not to switch) and the kinetics (slow vs fast reorientations). The authors also work to compare their model to a few others including the Levy walk (whose construction arises from a Markov process) to a simple exponential distribution, all of which have been used to study foraging and search behaviors.

      Weaknesses:

      This manuscript has two weaknesses that dampen the enthusiasm for the results. First, in all of the examples the authors cite where a Gillespie algorithm is used to sample from a distribution, be it the kinetics associated with chemical dynamics, or a Lotka-Volterra Competition Model, there are underlying processes that govern the evolution of the dynamics, and thus the sampling from distributions. In one of their references, for instance, the stochasticity arises from the birth and death rates, thereby influencing the genetic drift in the model. In these examples, the process governing the dynamics (and thus generating the distributions from which one samples) is distinct from the behavior being studied. In this manuscript, the distribution being sampled is the exponential decay function of the reorientation rate (lines 100-102). This appears to be tautological - a decay function fitted to the reorientation data is then sampled to generate the distributions of the reorientation data. That the model performs well and matches the data is commendable, but it is unclear how that could not be the case if the underlying function generating the distribution was fit to the data.

      Thank you, we apologize that this was not clearer. In the Lotka-Volterra model, the density of predators and prey are being modeled, with the underlying assumption that rates of birth and death are inherently stochastic. In our model, the number of reorientations are being modeled, with the assumption (based on the experiments), that the occurrence of reorientations is stochastic, just like the occurrence (birth) of a prey animal is stochastic. However, the decay in M is phenomenological, and we speculate about the nature of M later in the manuscript.

      You are absolutely right that the decay function for M was fitted to the population average of reorientations and then sampled to generate the distributions of the reorientation data. This was intentional to show that the parameters chosen to match the population average would produce individual trajectories with comparable stochastic “switching” as the experimental data. All we’re trying to show really is that observed sudden changes in reorientation that appear persistent can be produced by a stochastic process without resorting to binary state assignments. In Calhoun, et al 2014 it is reported all animals produced switch-like behavior, but in Klein et al, 2017 it is reported that no animals showed abrupt transitions. López-Cruz et al seem to show a mix of these results, which can be easily explained by an underlying stochastic process.

      The second weakness is somewhat related to the first, in that absent an underlying mechanism or framework, one is left wondering what insight the model provides. Stochastic sampling a function generated by fitting the data to produce stochastic behavior is where one ends up in this framework, and the authors indeed point this out: "simple stochastic models should be sufficient to explain observably stochastic behaviors." (Line 233-234). But if that is the case, what do we learn about how the foraging is happening? The authors suggest that the decay parameter M can be considered a memory timescale; which offers some suggestion, but then go on to say that the "physical basis of M can come from multiple sources". Here is where one is left for want: The mechanisms suggested, including loss of sensory stimuli, alternations in motor integration, ionotropic glutamate signaling, dopamine, and neuropeptides are all suggested: these are basically all of the possible biological sources that can govern behavior, and one is left not knowing what insight the model provides. The array of biological processes listed is so variable in dynamics and meaning, that their explanation of what governs M is at best unsatisfying. Molecular dynamics models that generate distributions can point to certain properties of the model, such as the binding kinetics (on and off rates, etc.) as explanations for the mechanisms generating the distributions, and therefore point to how a change in the biology affects the stochasticity of the process. It is unclear how this model provides such a connection, especially taken in aggregate with the previous weakness.

      Providing a roadmap of how to think about the processes generating M, the meaning of those processes in search, and potential frameworks that are more constrained and with more precise biological underpinning (beyond the array of possibilities described) would go a long way to assuaging the weaknesses.

      Thank you, these are all excellent points. We should clarify that in López-Cruz et al, they claim that only 50% of the animals fit a local/global search paradigm. We are simply proposing there is no need for designating local and global searches if the data don’t really support it. The underlying behavior is stochastic, so the sudden switches sometimes observed can be explained by a stochastic process where the underlying rate is slowing down, thus producing the persistently slow reorientation rate when an apparent “switch” occurs. What we hope to convey is that foraging doesn’t appear to follow a decision paradigm, but instead a gradual change in reorientations which for individual worms, can occasionally produce reorientation trajectories that appear switch-like.

      As for M, you are correct, we should be more explicit. A decay in reorientation rate, rather than a sudden change, is consistent with observations made by López-Cruz et al.  They found that the neurons AIA and ADE redundantly suppress reorientations, and that silencing either one was sufficient to restore the large number of reorientations during early foraging. The synaptic output of AIA and ADE was inhibited over long timescales (tens of minutes) by presynaptic glutamate binding to MGL-1, a slow G-Protein coupled receptor expressed in AIA and ADE. Their results support a model where sensory neurons suppress the synaptic output of AIA and ADE, which in turn leads to a large number of reorientations early in foraging. As time passes, glutamatergic input from the sensory neurons decrease, which leads to disinhibition of AIA and ADE, and a subsequent suppression of reorientations.

      The sensory inputs into AIA and ADE are sequestered into two separate circuits, with AIA receiving chemosensory input and ADE receiving mechanosensory input. Since the suppression of either AIA or ADE is sufficient to increase reorientations, the decay in reorientations is likely due to the synaptic output of both of these neurons decaying in time. This correlates with an observed decrease in sensory neuron activity as well, so the timescale of reorientation decay could be tied to the timescale of sensory neuron activity, which in turn is influencing the timescale of AIA/ADE reorientation suppression. This implies that our factor “M” is likely the sum of several different sensory inputs decaying in time.

      The molecular basis of which sensory neuron signaling factors contribute to decreased AIA and ADE activity is made more complicated by the observation that the glutamatergic input provided by the sensory neurons was not essential, and that additional factors besides glutamate contribute to the signaling to AIA and ADE. In addition to this, it is simply not the sensory neuron activity that decays in time, but also the sensitivity of AIA and ADE to sensory neuron input that decays in time. Simply depolarizing sensory neurons after the animals had starved for 30 minutes was insufficient to rescue the reorientation rates observed earlier in the foraging assay. This observation could be due to decreased presynaptic vesicle release, and/or decreased receptor localization on the postsynaptic side.

      In summary, there are two neuronal properties that appear to be decaying in time. One is sensory neuron activity, and the other is decreased potentiation of presynaptic input onto AIA and ADE. Our factor “M” is a phenomenological manifestation of these numerous decaying factors.

      Reviewer #3 (Public review):

      Summary:

      This intriguing paper addresses a special case of a fundamental statistical question: how to distinguish between stochastic point processes that derive from a single "state" (or single process) and more than one state/process. In the language of the paper, a "state" (perhaps more intuitively called a strategy/process) refers to a set of rules that determine the temporal statistics of the system. The rules give rise to probability distributions (here, the probability for turning events). The difficulty arises when the sampling time is finite, and hence, the empirical data is finite, and affected by the sampling of the underlying distribution(s). The specific problem being tackled is the foraging behavior of C. elegans nematodes, removed from food. Such foraging has been studied for decades, and described by a transition over time from 'local'/'area-restricted' search'(roughly in the initial 10-30 minutes of the experiments, in which animals execute frequent turns) to 'dispersion', or 'global search' (characterized by a low frequency of turns). The authors propose an alternative to this two-state description - a potentially more parsimonious single 'state' with time-changing parameters, which they claim can account for the full-time course of these observations.

      Figure 1a shows the mean rate of turning events as a function of time (averaged across the population). Here, we see a rapid transient, followed by a gradual 4-5 fold decay in the rate, and then levels off. This picture seems consistent with the two-state description. However, the authors demonstrate that individual animals exhibit different "transition" statistics (Figure 1e) and wish to explain this. They do so by fitting this mean with a single function (Equations 1-3).

      Strengths:

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Weaknesses:

      (1) The authors claim that only about half the animals tested exhibit discontinuity in turning rates. Can they automatically separate the empirical and model population into these two subpopulations (with the same method), and compare the results?

      Thank you, we should clarify that the observation that about half the animals exhibit discontinuity was not made by us, but by López-Cruz et al. The observed fraction of 50% was based on a visual assessment of the dual regression method we described. To make the process more objective, we decided to simply plot the distributions of the metrics they used for this assessment to see if two distinct populations could be observed. However, the distributions of slope differences and transition times do not produce two distinct populations. Our stochastic approach, which does not assume abrupt state-transitions, also produces comparable distributions. To quantify this, we will perform permutation tests on the means and variances differences between experimental and model data.

      (2) The equations consider an exponentially decaying rate of turning events. If so, Figure 2b should be shown on a semi-logarithmic scale.

      We are happy to add this panel as well.

      (3) The variables in Equations 1-3 and the methods for simulating them are not well defined, making the method difficult to follow. Assuming my reading is correct, Omega should be defined as the cumulative number of turning events over time (Omega(t)), not as a "turn" or "reorientation", which has no derivative. The relevant entity in Figure 1a is apparently <Omega (t)>, i.e. the mean number of events across a population which can be modelled by an expectation value. The time derivative would then give the expected rate of turning events as a function of time.

      Thank you, you are correct. Please see response to Reviewer #1.

      (4) Equations 1-3 are cryptic. The authors need to spell out up front that they are using a pair of coupled stochastic processes, sampling a hidden state M (to model the dynamic turning rate) and the actual turn events, Omega(t), separately, as described in Figure 2a. In this case, the model no longer appears more parsimonious than the original 2-state model. What then is its benefit or explanatory power (especially since the process involving M is not observable experimentally)?

      Thank you, yes we see how as written this was confusing. In our response to Reviewer #1, we added an important detail:

      While reorientations are modeled as discrete events, which is observationally true, the amount of M at time t\=0 is chosen to be large (M<sub>0</sub>←1,000), so that over the timescale of 40 minutes, the decay in M is practically continuous. This ensures that sudden changes in reorientations are not due to sudden changes in M, but due to the inherent stochasticity of reorientations.

      However you are correct that if M was chosen to have a binary value of 0 or 1, then this would indeed be the two state model. Adding this as an additional model would be a good idea to compare how this matches the experimental data, and we are happy to add it.

      (5) Further, as currently stated in the paper, Equations 1-3 are only for the mean rate of events. However, the expectation value is not a complete description of a stochastic system. Instead, the authors need to formulate the equations for the probability of events, from which they can extract any moment (they write something in Figure 2a, but the notation there is unclear, and this needs to be incorporated here).

      Thank you, yes please see our response to Reviewer #1.

      (6) Equations 1-3 have three constants (alpha and gamma which were fit to the data, and M0 which was presumably set to 1000). How does the choice of M0 affect the results?

      Thank you, this is a good question. We will test this down to a binary state of M as mentioned in comment #4.

      (7) M decays to near 0 over 40 minutes, abolishing omega turns by the end of the simulations. Are omega turns entirely abolished in worms after 30-40 minutes off food? How do the authors reconcile this decay with the leveling of the turning rate in Figure 1a?

      Yes, reviewer #1 recommended adding a baseline reorientation rate which is likely more biologically plausible. However, we should also note that in Klein et al they observed a continuous decay over 50 minutes.

      (8) The fit given in Figure 2b does not look convincing. No statistical test was used to compare the two functions (empirical and fit). No error bars were given (to either). These should be added. In the discussion, the authors explain the discrepancy away as experimental limitations. This is not unreasonable, but on the flip side, makes the argument inconclusive. If the authors could model and simulate these limitations, and show that they account for the discrepancies with the data, the model would be much more compelling. To do this, I would imagine that the authors would need to take the output of their model (lists of turning times) and convert them into simulated trajectories over time. These trajectories could be used to detect boundary events (for a given size of arena), collisions between individuals, etc. in their simulations and to see their effects on the turn statistics.

      Thank you, we will add error bars and perform a permutation test on the mean and variance differences between experiment and model over the 40 minute window.

      (9) The other figures similarly lack any statistical tests and by eye, they do not look convincing. The exception is the 6 anecdotal examples in Figure 2e. Those anecdotal examples match remarkably closely, almost suspiciously so. I'm not sure I understood this though - the caption refers to "different" models of M decay (and at least one of the 6 examples clearly shows a much shallower exponential). If different M models are allowed for each animal, this is no longer parsimonious. Are the results in Figure 2d for a single M model? Can Figure 2e explain the data with a single (stochastic) M model?

      Thank you, yes, we will perform permutation tests on the mean and variance differences in the observed distributions in figure 2d. We certainly don’t want the panels in Figure 2e to be suspicious! These comparisons were drawn from calculating the correlations between all model traces and all experimental traces, and then choosing the top hits. Every time we run the simulation, we arrive at a different set of examples. Since it was recommended we add a baseline rate, these examples will be a completely different set when we run the simulation, again.

      We apologize for the confusion regarding M. Since the worms do not all start out with identical reorientation rates, we drew the initial M value from a distribution centered on M0 and a variance to match the initial distribution of observed experimental rates.

      (10) The left axes of Figure 2e should be reverted to cumulative counts (without the normalization).

      Thank you, we will add this. We want to clarify that we normalized it because we chose these examples based on correlation to show that the same types of sudden changes in search strategy can occur with a model that doesn’t rely on sudden rate changes.

      (11) The authors give an alternative model of a Levy flight, but do not give the obvious alternative models:

      a) the 1-state model in which P(t) = alpha exp (-gamma t) dt (i.e. a single stochastic process, without a hidden M, collapsing equations 1-3 into a single equation).

      b) the originally proposed 2-state model (with 3 parameters, a high turn rate, a low turn rate, and the local-to-global search transition time, which can be taken from the data, or sampled from the empirical probability distributions). Why not? The former seems necessary to justify the more complicated 2-process model, and the latter seems necessary since it's the model they are trying to replace. Including these two controls would allow them to compare the number of free parameters as well as the model results. I am also surprised by the Levy model since Levy is a family of models. How were the parameters of the Levy walk chosen?

      Thank you, we will remove this section completely, as it is tangential to the main point of the paper.

      (12) One point that is entirely missing in the discussion is the individuality of worms. It is by now well known that individual animals have individual behaviors. Some are slow/fast, and similarly, their turn rates vary. This makes this problem even harder. Combined with the tiny number of events concerned (typically 20-40 per experiment), it seems daunting to determine the underlying model from behavioral statistics alone.

      Thank you, yes we should have been more explicit in the reasoning behind drawing the initial M from a distribution (response to comment #9). We assume that not every worm starts out with the same reorientation rate, but that some start out fast (high M) and some start out slow (low M). However, we do assume M decays with the same kinetics, which seems sufficient to produce the observed phenomena.

      (13) That said, it's well-known which neurons underpin the suppression of turning events (starting already with Gray et al 2005, which, strangely, was not cited here). Some discussion of the neuronal predictions for each of the two (or more) models would be appropriate.

      Thank you, yes we will add Gray et al, but also the more detailed response to Reviewer #2.

      (14) An additional point is the reliance entirely on simulations. A rigorous formulation (of the probability distribution rather than just the mean) should be analytically tractable (at least for the first moment, and possibly higher moments). If higher moments are not obtainable analytically, then the equations should be numerically integrable. It seems strange not to do this.

      Thank you for suggesting this, we will add these analyses.

      In summary, while sample simulations do nicely match the examples in the data (of discontinuous vs continuous turning rates), this is not sufficient to demonstrate that the transition from ARS to dispersion in C. elegans is, in fact, likely to be a single 'state', or this (eq 1-3) single state. Of course, the model can be made more complicated to better match the data, but the approach of the authors, seeking an elegant and parsimonious model, is in principle valid, i.e. avoiding a many-parameter model-fitting exercise.

      As a qualitative exercise, the paper might have some merit. It demonstrates that apparently discrete states can sometimes be artifacts of sampling from smoothly time-changing dynamics. However, as a generic point, this is not novel, and so without the grounding in C. elegans data, is less interesting.

      Thank you, we agree that this is a generic phenomenon, which is partly why we did this. The data from López-Cruz seem to agree in part with Calhoun et al, that claim abrupt transitions occur, and Klein et al, which claim they do not occur. Since the underlying phenomenon is stochastic, we propose the mixed observations of sudden and gradual changes in search strategy are simply the result of a stochastic process, which can produce both phenomena for individual observations.

    1. eLife Assessment

      The study presents valuable findings on the molecular mechanisms of glucose-stimulated insulin secretion from pancreatic islets, focusing on the main regulatory elements of the signaling pathway in physiological conditions. While the evidence supporting the conclusions is solid, the study can be strengthened by the use of a beta cell line or knockout mice. The work will be of interest to cell biologists and biochemists working on diabetes.

    2. Reviewer #2 (Public review):

      The authors identified new target elements for prostaglandin E2 (PGE2) through which insulin release can be regulated in pancreatic beta cells under physiological conditions. In vitro extracellular exposure to PGE2 could directly and dose-dependently inhibit the potassium channel Kv2.2. In vitro pharmacology revealed that this inhibition occurs through the EP2/4 receptors, which activate protein kinase A (PKA). By screening specific sites of the Kv2.2 channel, the target phosphorylation site (S448) for PKA regulation was found. The physiological relevance of the described signaling cascade was investigated and confirmed in vivo, using a Kv2.2 knockdown mouse model.

      The strength of this manuscript is the novelty of the (EP2/4-PKA-Kv2.2 channel) molecular pathway described and the comprehensive methodological toolkit the authors have relied upon.

      The introduction is detailed and contains all the information necessary to place the claims in context. Although the dataset is comprehensive and a logical lead is consistently built, there is one important point to consider: to clarify that the described signaling pathway is characteristic of normal physiological conditions and thus differs from pathological changes. It would be useful to carry out basic experiments in a diabetes model (regardless of in mouse or rat even).

      Comments on revisions:

      The authors addressed my comments sufficiently. I have no additional questions to clarify.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the mechanism by which PGE2 inhibits the release of insulin from pancreatic beta cells in response to glucose. The researchers used a combination of cell line experiments and studies in mice with genetic ablation of the Kv2.2 channel. Their findings suggest a novel pathway where PGE2 acts through EP2/EP4 receptors to activate PKA, which directly phosphorylates a specific site (S448) on the Kv2.2 channel, inhibiting its activity and reducing GSIS.

      Strengths:

      - The study elegantly demonstrates a potential pathway connecting PGE2, EP2/EP4 receptors, PKA, and Kv2.2 channel activity, using embryonic cell line.

      - Additional experiments in INS1 and primary mouse beta cells with altered Kv2.2 function partially support the inhibitory role of PGE2 on GSIS through Kv2.2 inhibition.

      Weaknesses:

      - A critical limitation is the use of HEK293T cells, which are not pancreatic beta cells. Functional aspects can differ significantly between these cell types.

      - The study needs to address the apparent contradiction of PKA activating insulin secretion in beta cells, while also inhibiting GSIS through the proposed mechanism.

      - A more thorough explanation is needed for the discrepancies observed between the effects of PGE2 versus Kv2.2 knockdown/mutation on the electrical activity of beta cells and GSIS.

      Thank you for your positive evaluation and constructive feedback on our study. We appreciate the concern regarding the use of HEK293T cells, which are not pancreatic beta cells and may exhibit functional differences. In response, we have repeated our key experiments using INS1 cells and primary mouse beta cells, which are more representative of the native beta cell environment. These additional experiments confirm our hypothesis and further support the role of Kv2.2 in PGE2-induced inhibition of GSIS. In beta cells, glucose-induced PKA activation is highly localized. As a result, while some PKA pathways promote insulin secretion, others may inhibit it. To directly demonstrate that PGE2-induced PKA phosphorylation of Kv2.2 is involved in the inhibitory effect on GSIS, we overexpressed the S448A mutant Kv2.2 channel in INS-1(832/13) cells. Our results show that Kv2.2-S448A channels significantly attenuate the inhibitory effect of PGE2 on GSIS, further supporting the critical role of Kv2.2 phosphorylation at S448. These data have been added to the revised Figure 7C.

      Reviewer #2 (Public Review):

      The authors identified new target elements for prostaglandin E2 (PGE2) through which insulin release can be regulated in pancreatic beta cells under physiological conditions. In vitro extracellular exposure to PGE2 could directly and dose-dependently inhibit the potassium channel Kv2.2. In vitro pharmacology revealed that this inhibition occurs through the EP2/4 receptors, which activate protein kinase A (PKA). By screening specific sites of the Kv2.2 channel, the target phosphorylation site (S448) for PKA regulation was found. The physiological relevance of the described signaling cascade was investigated and confirmed in vivo, using a Kv2.2 knockdown mouse model.

      The strength of this manuscript is the novelty of the (EP2/4-PKA-Kv2.2 channel) molecular pathway described and the comprehensive methodological toolkit the authors have relied upon.

      The introduction is detailed and contains all the information necessary to place the claims in context. Although the dataset is comprehensive and a logical lead is consistently built, there is one important point to consider: to clarify that the described signaling pathway is characteristic of normal physiological conditions and thus differs from pathological changes. It would be useful to carry out basic experiments in a diabetes model (regardless of whether this is in mice or rats).

      Thank you for your positive evaluation and insightful comment. We have clarified in the Discussion section that our findings pertain specifically to physiological conditions. We acknowledge the importance of investigating the signaling pathway in a pathological context and plan to conduct experiments using a diabetes model in future studies to explore how this pathway may differ under such conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 3A-C: PKA activation regulates different functional aspects in beta cells and HEK293T cells. It is well known that PKA activation enhances insulin secretion in beta cells, therefore the mechanisms that allow the same pathway at the same time to inhibit GSIS are not clear and should be addressed by experiments in beta cells.

      Thank you for your insightful comment. Specificity and versatility in cAMP-PKA signaling are governed by the spatial localization and temporal dynamics of the signal. In beta cells, glucose-induced PKA activation is highly localized (Tengholm and Gylfe, 2017). As a result, while some PKA pathways promote insulin secretion, others may inhibit it. For example, a global increase in cAMP, such as through treatment with Db-cAMP, can simultaneously activate both stimulatory and inhibitory PKA pathways, reflecting a more integrated, complex response. In previous studies, 1 mM Db-cAMP was shown to enhance GSIS in INS-1 cells (Dezaki et al., 2011). We observed that 1 mM Db-cAMP increased GSIS, but lower concentrations (10 mM) decreased GSIS (as shown in Author response image 1). These findings suggest that not all PKA signaling events increase GSIS. To further investigate the role of PGE2-induced PKA phosphorylation of Kv2.2 in the inhibition of GSIS, we overexpressed the S448A mutant of Kv2.2 in INS-1 (832/13) cells. Our results showed that the Kv2.2-S448A mutant significantly attenuated the inhibitory effect of PGE2 on GSIS. These new data have been incorporated into the revised Figure 7C.

      Author response image 1.

      Effect of Db-cAMP on GSIS in INS-1 cells. Statistics for the effect of different concentrations of Db-cAMP on GSIS in INS-1(832/13) cells. One-way ANOVA with Bonferroni post hoc test. *p < 0.05; ***p < 0.001; ****p < 0.0001; n.s., not significant.

      (2) Figure 3G: One would expect that the phospho-mimetic mutation, S448D, will have an opposite effect to S448A and a similar effect as PGE2 or PKA activator in Figure 3B. There is no explanation by the authors for having the same effect in S448A and S448D.

      Thank you for your thoughtful comment. Indeed, the S448D mutation exhibited a similar effect to PGE2 on Kv2.2 channels, as we observed significantly smaller currents compared to wild-type Kv2.2 (Figure 3F). The S448D mutation mimics the phosphorylated state of S448, and since PGE2 regulates Kv2.2 channels by phosphorylating this residue, it has no further effect on the S448D mutant (Figure 3G). In contrast, the S448A mutation prevents phosphorylation at this site, which explains why PGE2 has no effect on the currents of S448A mutant Kv2.2 channels (Figure 3H). These results confirm that PGE2 modulates Kv2.2 channels specifically through phosphorylation of S448, as evidenced by the lack of effect on both the S448A and S448D mutants.

      (3) Figure 4E: Since both PGE2 and Kv2.2 KD inhibit the activity of the channel, it doesn't definitively prove whether PGE2 acts through Kv2.2 in INS-1 cells. A complementary experiment should be done in which overactivation of Kv2.2 rescues the effect of PGE2. For example, with the S448A form of the channel.

      We appreciate your comment and valuable suggestion. Knockdown of Kv2.2 abrogated the inhibitory effect of PGE2 on I<sub>K</sub> currents in INS-1 cells (Figure 4E and F), which strongly indicates that PGE2 acts through Kv2.2. While we agree that the suggested complementary experiment with Kv2.2 overactivation (e.g., using the S448A mutant) could provide additional insights, we believe the current data sufficiently support our conclusion, as the knockdown of Kv2.2 eliminates the observed PGE2 effect, providing direct evidence of the channel's involvement.

      (4) Figure 5C: This result requires further explanation. If PGE2 downregulates Kv2.2 activity and has an inhibitory effect on GSIS, why does Kv2.2 KD have the opposite effect?

      The knockdown of Kv2.2 (Fig. 5C) reduced action potential (AP) firing rates compared to the scramble control (Fig. 5B), which is expected because Kv2.2 is critical for maintaining AP firing. When Kv2.2 is knocked down, the reduced AP firing diminishes the system’s responsiveness to further modulation by PGE2. This is because PGE2 exerts its effects primarily through Kv2.2 channels. Therefore, in the Kv2.2 knockdown condition, PGE2 does not exert an additional inhibitory effect on AP firing rates, as the channels critical for its action are already impaired.

      (5) Figure 5D - The EP1-EP4 receptor antibodies should be validated at least in INS-1(832/13) cells using knockdowns.

      Thank you for your suggestion. We have validated the EP1-EP4 receptor antibodies in INS-1(832/13) cells using knockdown experiments. The validation results, including confirmation of specificity and knockdown efficiency, are provided in Supplemental Figure S2.

      (6) Figure 7B - These experiments don't necessarily prove that PGE2 acts directly through Kv2.2 inhibition. Using the S448A mutation in these experiments could prove this point.

      Thank you for this valuable suggestion. We have now overexpressed the S448A mutant Kv2.2 channels in INS-1(832/13) cells, and the results demonstrate that Kv2.2-S448A channels significantly reduce the inhibitory effect of PGE2 on GSIS. These new data have been incorporated into the revised Figure 7C.

      Reviewer #2 (Recommendations For The Authors):

      (1) Deficiencies and inaccuracies in the description of the methods (animal numbers, name of vendors, abbreviations) and the typos in the figures (axis label) require correction.

      Thank you for pointing this out. We have carefully reviewed the manuscript and the figures, making the necessary corrections to address the deficiencies in the methods section and the typos in the figure axis labels.

      (2) Reducing the number of figures (Figures 7/C-E: knockout mouse line test and Figure1/HEK cell experiments could be part of supplementary) and paragraphs would make the manuscript more compact and powerful. It would also ease its reading for non-experts.

      Thank you for your suggestion. We have moved Figures 7C-E to the supplementary data (Supplemental Figure S1) to streamline the main manuscript.

      (3) Multiple immunostainings for EP receptors in insulinoma cells or pancreatic islets would be representative.

      Due to the rabbit-derived nature of the antibodies (EP1, EP2, EP4), performing multiple immunostainings on the same samples is not feasible due to potential cross-reactivity. However, the immunohistochemistry images demonstrate that each antibody labels more than 90% of the cells, indicating that β-cell express different subtypes of EP receptors simultaneously.

      (4) The antagonists chosen (AH6809, AH23848) are non-specific. Experiments should be re-run (at least some) under more stringent conditions.

      Thank you for your suggestion. AH6809 and AH23848 are well-documented, widely used antagonists in the literature. To further strengthen our findings, we have included additional, widely-used antagonists: the EP2-specific antagonist TG4155 and the EP4-specific antagonist GW627368. The results obtained with these new antagonists were consistent with those observed using AH6809 and AH23848. These updated data are now included in the revised Figure 4I and 4J.

      (5) It would be very helpful to indeed emphasise that this work is for physiological conditions and that it is (or is not) modified in diabetes. Maybe even irrelevant for diabetes (?). This needs to be clarified and supported by data even if one could assume the authors intend to have a follow-up entirely dedicated to pathological changes, perhaps.

      Thank you for this insightful comment. We have clarified in the Discussion that our findings are specific to physiological conditions. To address this point, we have added the following statement:

      "Importantly, our findings pertain to physiological conditions. While we demonstrate the inhibitory effects of PGE2 on Kv2.2 channels in normal b-cells, the role of this pathway under diabetic conditions remains to be investigated and will be the focus of future studies."

      Dezaki K, Damdindorj B, Sone H, Dyachok O, Tengholm A, Gylfe E, Kurashina T, Yoshida M, Kakei M, Yada T (2011) Ghrelin attenuates cAMP-PKA signaling to evoke insulinostatic cascade in islet beta-cells. Diabetes 60:2315-2324.

      Tengholm A, Gylfe E (2017) cAMP signalling in insulin and glucagon secretion. Diabetes Obes Metab 19 Suppl 1:42-53.

    1. eLife Assessment

      This paper presents miniML, an AI-based framework for the detection of synaptic events. Benchmark results presented in the paper are compelling, demonstrating the superiority of miniML over current state-of-the-art alternatives. The performance of miniML is demonstrated across various experimental paradigms, showing that miniML has the potential to become a valuable tool for the analysis of synaptic signals.

    2. Reviewer #1 (Public review):

      O'Neill et al. have developed a software analysis application, miniML, that enables the quantification of electrophysiological events. They utilize a supervised deep learned-based method to optimize the software. miniML is able to quantify and standardize the analyses of miniature events, using both voltage and current clamp electrophysiology, as well as optically driven events using iGluSnFR3, in a variety of preparations, including in the cerebellum, calyx of held, golgi cell, human iPSC cultures, zebrafish, and Drosophila. The software appears to be flexible, in that users are able to hone and adapt the software to new preparations and events. Importantly, miniML is an open source software free for researchers to use and enables users to adapt new features using Python.

      Overall this new software has the potential to become widely used in the field and an asset to researchers. Importantly, a new graphical user interface has been generated that enables more user control and a more user-friendly experience. Further, the authors demonstrate how miniML performs relative to other platforms that have been developed, and highlight areas where miniML works optimally. With these revisions, miniML should now be of considerable benefit and utility to a variety of researchers.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents miniML as a supervised method for detection of spontaneous synaptic events. Recordings of such events are typically of low SNR, where state-of-the-art methods are prone to high false favourable rates. Unlike current methods, training miniML requires neither prior knowledge of the kinetics of events nor the tuning of parameters/thresholds.

      The proposed method comprises four convolutional networks, followed by a bi-directional LSTM and a final fully connected layer, which outputs a decision event/no event per time window. A sliding window is used when applying miniML to a temporal signal, followed by an additional estimation of events' time stamps. miniML outperforms current methods for simulated events superimposed on real data (with no events) and presents compelling results for real data across experimental paradigms and species.

      Strengths:

      The authors present a pipeline for benchmarking based on simulated events superimposed on real data (with no events). Compared to five other state-of-the-art methods, miniML leads to the highest detection rates and is most robust to specific choices of threshold values for fast or slow kinetics. A major strength of miniML is the ability to use it for different datasets. For this purpose, the CNN part of the model is held fixed and the subsequent networks are trained to adapt to the new data. This Transfer Learning (TL) strategy reduces computation time significantly and more importantly, it allows for using a substantially smaller data set (compared to training a full model) which is crucial as training is supervised (i.e. uses labeled examples).

      Weaknesses:<br /> The authors do not indicate how the specific configuration of miniML was set, i.e. number of CNNs, units, LSTM, etc. Please provide further information regarding these design choices, whether they were based on similar models or if chosen based on performance.

      The data for the benchmark system was augmented with equal amounts of segments with/without events. Data augmentation was undoubtedly crucial for successful training.<br /> (1) Does a balanced dataset reflect the natural occurrence of events in real data? Could the authors provide more information regarding this matter?<br /> (2) Please provide a more detailed description of this process as it would serve users aiming to use this method for other sub-fields.

      The benchmarking pipeline is indeed valuable and the results are compelling. However, the authors do not provide comparative results for miniML for real data (figures 4-8). TL does not apply to the other methods. In my opinion, presenting the performance of other methods, trained using the smaller dataset would be convincing of the modularity and applicability of the proposed approach.

      Impact:

      Accurate detection of synaptic events is crucial for the study of neural function. miniML has a great potential to become a valuable tool for this purpose as it yields highly accurate detection rates, it is robust, and is relatively easily adaptable to different experimental setups.

      Comments on revisions:

      The revised manuscript presents a compelling framework. The performance of mini ML is thouroughly explored and compared to several benchmarks. The training process along with other technical issues are now described in a satisfactory level of detail.<br /> I think the authors did a great job. They answered all claims and concerns raised by me and the other reviewers.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      O’Neill et al. have developed a software analysis application, miniML, that enables the quantification of electrophysiological events. They utilize a supervised deep learned-based method to optimize the software. miniML is able to quantify and standardize the analyses of miniature events, using both voltage and current clamp electrophysiology, as well as optically driven events using iGluSnFR3, in a variety of preparations, including in the cerebellum, calyx of held, Golgi cell, human iPSC cultures, zebrafish, and Drosophila. The software appears to be flexible, in that users are able to hone and adapt the software to new preparations and events. Importantly, miniML is an open-source software free for researchers to use and enables users to adapt new features using Python.

      Overall this new software has the potential to become widely used in the field and an asset to researchers. However, the authors fail to discuss or even cite a similar analysis tool recently developed (SimplyFire), and determine how miniML performs relative to this platform. There are a handful of additional suggestions to make miniML more user-friendly, and of broad utility to a variety of researchers, as well as some suggestions to further validate and strengthen areas of the manuscript:

      (1) miniML relative to existing analysis methods: There is a major omission in this study, in that a similar open source, Python-based software package for event detection of synaptic events appears to be completely ignored. Earlier this year, another group published SimplyFire in eNeuro (Mori et al., 2024; doi: 10.1523/eneuro.0326-23.2023). Obviously, this previous study needs to be discussed and ideally compared to miniML to determine if SimplyFire is superior or similar in utility, and to underscore differences in approach and accuracy.

      We thank the reviewer for bringing this interesting publication to our attention. We have included SimplyFire in our benchmarking for comprehensive comparison with miniML. The approach taken by SimplyFire differs from miniML in a number of ways. Our results show that miniML provides higher recall and precision than SimplyFire (revised Figure 3). We appreciate that SimplyFire provides a user-interface similar to the commonly used MiniAnalysis software. In addition, the peak-finding-based approach of SimplyFire makes it relatively robust to event shape, which facilitates analysis of diverse data. However, we noted a strong threshold-dependence and long run time of SimplyFire (revised Figure 3 and Figure 3—figure supplement 1). In addition, SimplyFire is not robust against various types of noise typically encountered in electrophysiological recordings. Our extended benchmark analysis thus indicates that AI-based event detection is superior to existing algorithmic approaches, including SimplyFire.

      (2) The manuscript should comment on whether miniML works equally well to quantify current clamp events (voltage; e.g. EPSP/mEPSPs) compared to voltage clamp (currents, EPSC/mEPSCs), which the manuscript highlights. Are rise and decay time constants calculated for each event similarly?

      miniML works equally well for current- and voltage events (Figure 5, Figure 9). In general, events of opposite polarity can be analyzed by simply inverting the data. Transfer learning models may further improve the detection.

      For each detected event, independent of data/recording type, rise times are calculated as 10–90% times (baseline–peak), and decay times are calculated as time to 50% of the peak. In addition, event decay time constants are calculated from a fit to the event average. With miniML being open-source, researchers can adapt the calculations of event statistics to their needs, if desired. In the revised manuscript, we have expanded the Methods section that describes the quantification of event statistics (Methods, Quantification).

      (3) The interface and capabilities of miniML appear quite similar to Mini Analysis, the free software that many in the field currently use. While the ability and flexibility for users to adapt and adjust miniML for their own uses/needs using Python programming is a clear potential advantage, can the authors comment, or better yet, demonstrate, whether there is any advantage for researchers to use miniML over Mini Analysis or SimplyFire if they just need the standard analyses?

      Following the reviewer’s suggestion, we developed a graphical user interface (GUI) for miniML to enhance its usability (Figure 2—figure supplement 2), which is provided on the GitHub repository. Our comprehensive benchmark analysis demonstrated that miniML outperforms existing tools such as MiniAnalysis and SimplyFire. The main advantages are (i) increased reliability of results, which eliminates the need for visual inspection; (ii) fast runtime and easy automation; (iii) superior detection performance as demonstrated by higher recall in both synthetic and real data; (iv) open-source Python-based design. We believe that these advantages make miniML a valuable tool for researchers recording various types of synaptic events, offering a more efficient and reliable solution compared to existing methods.

      (4) Additional utilities for miniML: The authors show miniML can quantify miniature electrophysiological events both current and voltage clamp, as well as optical glutamate transients using iGluSnFR. As the authors mention in the discussion, the same approach could, in principle, be used to quantify evoked (EPSC/EPSP) events using electrophysiology, Ca2+ events (using GCaMP), and AP waveforms using voltage indicators like ASAP4. While I don’t think it is reasonable to ask the authors to generate any new experimental data, it would be great to see how miniML performs when analysing data from these approaches, particularly to quantify evoked synaptic events and/or Ca2+ (ideally postsynaptic Ca2+ signals from miniature events, as the Drosophila NMJ have developed nice approaches).

      In the revised manuscript, we have extended the application examples of miniML. We applied miniML to detect mEPSPs recorded with the novel voltage-sensitive indicator ASAP5 (Figure 9 and Figure 9—figure supplement 1). We performed simultaneous recordings of membrane voltage through electrophysiology and ASAP5 voltage imaging in rat cultured neurons at physiological temperature. Data were analyzed using miniML, with electrophysiology data being used as ground-truth for assessing detection performance in imaging data. Our results demonstrate that miniML robustly detects mEPSPs in current-clamp, and can localize corresponding transients in imaging data. Furthermore, we observed that miniML performs better than template matching and deconvolution on ASAP5 imaging data (Figure 9 and Figure 9—figure supplement 2).

      Reviewer 2 (Public Review):

      This paper presents miniML as a supervised method for the detection of spontaneous synaptic events. Recordings of such events are typically of low SNR, where state-of-the-art methods are prone to high false positive rates. Unlike current methods, training miniML requires neither prior knowledge of the kinetics of events nor the tuning of parameters/thresholds.

      The proposed method comprises four convolutional networks, followed by a bi-directional LSTM and a final fully connected layer which outputs a decision event/no event per time window. A sliding window is used when applying miniML to a temporal signal, followed by an additional estimation of events’ time stamps. miniML outperforms current methods for simulated events superimposed on real data (with no events) and presents compelling results for real data across experimental paradigms and species. Strengths:

      The authors present a pipeline for benchmarking based on simulated events superimposed on real data (with no events). Compared to five other state-of-the-art methods, miniML leads to the highest detection rates and is most robust to specific choices of threshold values for fast or slow kinetics. A major strength of miniML is the ability to use it for different datasets. For this purpose, the CNN part of the model is held fixed and the subsequent networks are trained to adapt to the new data. This Transfer Learning (TL) strategy reduces computation time significantly and more importantly, it allows for using a substantially smaller data set (compared to training a full model) which is crucial as training is supervised (i.e. uses labeled examples).

      Weaknesses:

      The authors do not indicate how the specific configuration of miniML was set, i.e. number of CNNs, units, LSTM, etc. Please provide further information regarding these design choices, whether they were based on similar models or if chosen based on performance.

      The data for the benchmark system was augmented with equal amounts of segments with/without events. Data augmentation was undoubtedly crucial for successful training.

      (1) Does a balanced dataset reflect the natural occurrence of events in real data? Could the authors provide more information regarding this matter?

      In a given recording, the event frequency determines the ratio of event-containing vs. nonevent-containing data segments. Whereas many synapses have a skew towards non-events, high event frequencies as observed, e.g., in pyramidal cells or Purkinje neurons, can shift the ratio towards event-containing data.

      For model training, we extracted data segments from mEPSC recordings in cerebellar granule cells, which have a low mEPSC frequency (about 0.2 Hz, Delvendahl et al. 2019). Unbalanced training data may complicate model training (Drummond and Holte 2003; Prati et al. 2009; Tyagi and Mittal 2020). We therefore decided to balance the training dataset for miniML by down-sampling the majority class (i.e., non-event segments), so that the final datasets for model training contained roughly equal amounts of events and non-events.

      (2) Please provide a more detailed description of this process as it would serve users aiming to use this method for other sub-fields.

      We thank the reviewer for raising this point. In the revised manuscript, we present a systematic analysis of the impact of imbalanced training data on model training (Figure 1—figure supplement 2). In addition, we have revised the description of model training and data augmentation in the Methods section (Methods, Training data and annotation).

      The benchmarking pipeline is indeed valuable and the results are compelling. However, the authors do not provide comparative results for miniML for real data (Figures 4-8). TL does not apply to the other methods. In my opinion, presenting the performance of other methods, trained using the smaller dataset would be convincing of the modularity and applicability of the proposed approach.

      Quantitative comparison of synaptic detection methods on real-world data is challenging because the lack of ground-truth data prevents robust, quantitative analyses. Nevertheless, we compared miniML to common template-based and finite-threshold based methods on four different types of synapses. We noted that miniML generally detects more events, whereas other methods are susceptible to false-positives (Figure 4—figure supplement 1). In addition, we analyzed the performance of miniML on voltage imaging data (Figure 9). Simultaneous recordings of electrophysiological and imaging data allowed a quantitative comparison of detection methods in this dataset. Our results demonstrate that miniML provides higher recall for optical minis recorded using ASAP5 (Figure 9 and Figure 9—figure supplement 2; F1 score, Cohen’s d 1.35 vs. template matching and 5.1 vs. deconvolution).

      Impact:

      Accurate detection of synaptic events is crucial for the study of neural function. miniML has a great potential to become a valuable tool for this purpose as it yields highly accurate detection rates, it is robust, and is relatively easily adaptable to different experimental setups.

      Additional comments:

      Line 73: the authors describe miniML as "parameter-free". Indeed, miniML does not require the selection of pulse shape, rise/fall time, or tuning of a threshold value. Still, I would not call it "parameter-free" as there are many parameters to tune, starting with the number of CNNs, and number of units through the parameters of the NNs. A more accurate description would be that as an AI-based method, the parameters of miniML are learned via training rather than tuned by the user.

      We agree that a deep learning model is not parameter-free, and this term may be misleading. We have therefore changed this sentence in the introduction as follows: "The method is fast, robust to threshold choice, and generalizable across diverse data types [...]"

      Line 302: the authors describe miniML as "threshold-independent". The output trace of the model has an extremely high SNR so a threshold of 0.5 typically works. Since a threshold is needed to determine the time stamps of events, I think a better description would be "robust to threshold choice".

      To detect event localizations, a peak search is performed on the model output, which uses a minimum peak height parameter (or threshold). Extreme values for this parameter do indeed have a small impact on detection performance (Figure 3J). We have changed the description in the introduction and discussion according to the reviewer’s suggestion.

      Reviewer 3 (Public Review):

      miniML as a novel supervised deep learning-based method for detecting and analyzing spontaneous synaptic events. The authors demonstrate the advantages of using their methods in comparison with previous approaches. The possibility to train the architecture on different tasks using transfer learning approaches is also an added value of the work. There are some technical aspects that would be worth clarifying in the manuscript:

      (1) LSTM Layer Justification: Please provide a detailed explanation for the inclusion of the LSTM layer in the miniML architecture. What specific benefits does the LSTM layer offer in the context of synaptic event detection?

      Our model design choice was inspired by similar approaches in the literature (Donahue et al. 2017; Islam et al. 2020; Passricha and Aggarwal 2019; Tasdelen and Sen 2021; Wang et al. 2020). Convolutional and recurrent neural networks are often combined for time-series classification problems as they allow learning spatial and temporal features, respectively. Combining the strengths of both network architectures can thus help improve the classification performance. Indeed, a CNN-LSTM architecture proved to be superior in both training accuracy and detection performance (Figure 1—figure supplement 2). Further, this architecture requires fewer free parameters than comparable model designs using fully connected layers instead. The revised manuscript shows a comparison of different model architectures (Figure 1—figure supplement 2), and we added the following description to the text (Methods, Deep learning model architecture):

      "The combination of convolutional and recurrent neural network layers helps to improve the classification performance for time-series data. In particular, LSTM layers allow learning temporal features."

      (2) Temporal Resolution: Can you elaborate on the reasons behind the lower temporal resolution of the output? Understanding whether this is due to specific design choices in the model, data preprocessing, or post-processing will clarify the nature of this limitation and its impact on the analysis.

      When running inference on a continuous recording, we choose to use a sliding window approach with stride. Therefore, the model output has a lower temporal resolution than the raw data, which is determined by the stride length (i.e., how many samples to advance the sliding window). While using a stride is not required, it significantly reduces inference time (cf. Figure 2—figure supplement 1). We recommend a stride of 20 samples, which does not impact the detection of events. Any subsequent quantification of events (amplitude, area, risetimes, etc.) is performed on raw data. Based on the reviewer’s comment, we have adapted the code to resample the prediction trace to the sampling rate of the original data. This maintains temporal precision and avoids confusion.

      The Methods now include the following statement:

      "To maintain temporal precision, the prediction trace is resampled to the sampling frequency of the raw data."

      (3) Architecture optimization: how was the architecture CNN+LSTM optimized in terms of a number of CNN layers and size?

      We performed a Bayesian optimization over a defined range of hyperparameters in combination with empirical hyperparameter tuning. We now describe this in the Methods section as follows:

      "To optimise the model architecture, we performed a Bayesian optimisation of hyperparameters. Hyperparameter ranges were chosen for the free parameters of all layers. Optimisation was then performed with a maximum number of trials of 50. Models were evaluated using the validation dataset. Because higher number of free parameters tended to increase inference times, we then empirically tuned the chosen hyperparameter combination to achieve a trade-off between number of free parameters and accuracy."

      Recommendations For The Authors

      Reviewing Editor (Recommendations For The Authors):

      Overall suggestions to the authors:

      (1) Directly compare miniML with SimplyFire (which was not cited or discussed in the original manuscript), with both idealized and actual data. Discuss the pros/cons of each software.

      We have conducted an extensive comparison between miniML and SimplyFire using both simulated and actual experimental data. This analysis is now presented in the revised Figure 3, Figure 3—figure supplement 1, and Figure 4—figure supplement 1. In addition, we have included relevant citations for SimplyFire in our manuscript. These additions provide a more comprehensive and balanced view of the available tools in the field, positioning our work within the broader context of existing solutions.

      (2) Generate a better user interface akin to MiniAnalysis or SimplyFire.

      We thank the editor and reviewers for the suggestion to improve the user interface. We have created a user-friendly graphical user interface (GUI) for miniML that is available on our GitHub repository. This GUI is now showcased in Figure 2—figure supplement 2 of the manuscript. The new interface allows users to load and analyze data through an intuitive point-and-click system, visualize results in real-time, and adjust parameters easily without coding knowledge. We have incorporated user feedback to refine the interface and improve user experience. These improvements significantly enhance the accessibility of miniML, making it more user-friendly for researchers with varying levels of programming expertise.

      Reviewer 1 (Recommendations For The Authors):

      Related to point (1) of the Public Review, we have taken the liberty to compare electrophysiological data using miniAnalysis, SimiplyFire, and miniML. In our comparison, we note the following in our experience:

      (1.1) In contrast to both SimplyFire and miniAnalysis, miniML does not currently have a user-friendly interface where the user can directly control or change the parameters of interest, nor does miniML have a user control center, so the user cannot simply type or select the mini manually. Rather, if any parameter needs to be changed, the user needs to read, understand, and change the original source code to generate the preferred change. This level of "activation energy" and required user coding expertise in computer science, which many researchers do not have, renders miniML much less accessible when directly compared to SimplyFire and miniAnalysis. Hence, unless miniML’s interface can be made more user-friendly, this is a major disadvantage, especially when compared to SimplyFire, which has many of the same features as miniML but with a much easier interface and user controls.

      As suggested by the reviewer, we have created a graphical user interface (GUI) for miniML. The GUI allows easy data loading, filtering, analysis, event inspection, and saving of results without the need for writing Python code. Figure 2—figure supplement 2 illustrates the typical workflow for event analysis with miniML using the GUI and a screenshot of the user interface. Code to use miniML via the GUI is now included in the project’s GitHub repository. The GUI provides a simple and intuitive way to analyze synaptic events, whereas running miniML as Python script allows for more customization and a high degree of automatization.

      (1.2) We compared electrophysiological miniature events between miniML, SimplyFire, and miniAnalysis. All three achieved similar mean amplitudes in "wild type" conditions, and conditions in which mini events were enhanced and diminished, so the overall means and utilities are similar, with miniML and SimplyFire being preferred given the flexibility and much faster analysis. We did note a few differences, however. SimplyFire tends to capture a high number of mini-events over miniML, especially in conditions of diminished mini amplitude (e.g., miniML found 76 events, while SimplyFire 587). The mean amplitudes, however, were similar. It seems that in data with low SNR, SimplyFire captures many more events as real minis that are probably noise, while miniML is more selective, which might be an advantage in miniML. That being said, we found SimplyFire to be superior in many respects, not least of which the user interface and experience.

      We appreciate the reviewer’s thorough comparison of miniML, SimplyFire, and MiniAnalysis. While we acknowledge SimplyFire’s user-friendly interface, our study highlights several advantages of AI-based event analysis over conventional algorithmic approaches. Our updated benchmark analysis revealed better detection performance of miniML compared with SimplyFire (revised Figure 3), which had similar performance to deconvolution. As already noted by the reviewer, high false positive rates are a major issue of the SimplyFire approach. Although a minimum amplitude cutoff can partially resolve this problem, detection performance is highly sensitive to threshold setting (revised Figure 3). Another apparent disadvantage of SimplyFire is its relatively slow runtime (Figure 3—figure supplement 1). Finally, we have enhanced miniML’s accessibility by providing a graphical user interface that is easy to use and provides additional functionality.

      Some technical comments:

      (1) Improvements to the dependence version of miniML: There is a need to clarify the dependence version of the python and tensor flow used in this study and in the GitHub. We used Python version 3.8.19 to load the miniML model. However, if Python versions >=3.9, as described on the GitHub provided, it is difficult to have a matched h5py version installed. It is also inaccurate to say using Python >=3.9, because tensor flow version for this framework needs to be around 2.13. However, if using Python >=3.10, it will only allow 2.16 version tensor flow to be the download choice. Therefore, as a Python framework, the dependency version needs to be specified on GitHub to allow researchers to access the model using the entire work.

      Thank you for highlighting this issue. We have now included specific version numbers in the requirements to avoid version conflicts and to ensure proper functioning of the code.

      (2) Due to the intrinsic characteristics of the trained model, every model is only suitable for analyzing data with similar attributes. It is hard for researchers without a strong computer science background to train a new model themselves for their specific data. Therefore, it would be preferred if there were more available transfer learning models on GitHub accessible for researchers to adapt to their data.

      We would like to thank the reviewer for this feedback. Trained models (such as the default model) can often be used on different data (see, e.g., Figure 4, where data from four distinct synaptic preparations were analyzed with the base model, and Figure 5—figure supplement 1). However, changes in event waveform and/or noise characteristics may necessitate transfer learning to obtain optimal results with miniML. We have revised the description and tutorial for model training on the project’s GitHub repository to provide more guidance in this process. In addition, we now provide a tutorial on how to use existing models on out-of-sample data with distinct kinetics, using resampling. We hope these updates to the miniML GitHub repository will facilitate the use of the method.

      Following the suggestion by the reviewer, we have provided the transfer learning models used for the manuscript on the project’s GitHub repository to increase the number of available machine learning models for event detection. In addition, users of miniML are encouraged to supply their custom models. We hope that this will facilitate model exchange between laboratories in the future.

      Reviewer 3:

      I congratulate all authors for the convincing demonstration of their methodology, I do not have additional recommendations.

      We would like to thank the reviewer for the positive assessment of our manuscript.

      References

      Delvendahl, I., Kita, K., & Müller, M. (2019). Rapid and sustained homeostatic control of presynaptic exocytosis at a central synapse. Proceedings of the National Academy of Sciences, 116(47), 23783–23789. https://doi.org/10.1073/pnas.1909675116

      Donahue, J., Hendricks, L. A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., & Darrell, T. (2017). Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 677–691. https://doi.org/10.1109/tpami.2016.2599174

      Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. https: //api.semanticscholar.org/CorpusID:204083391

      Islam, M. Z., Islam, M. M., & Asraf, A. (2020). A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using x-ray images. Informatics in Medicine Unlocked, 20, 100412. https://doi.org/10.1016/j.imu.2020.100412

      Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274. https://doi.org/10.1515/jisys-2018-0372

      Prati, R. C., Batista, G. E. A. P. A., & Monard, M. C. (2009). Data mining with imbalanced class distributions: Concepts and methods. Indian International Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:16651273

      Tasdelen, A., & Sen, B. (2021). A hybrid CNN-LSTM model for pre-miRNA classification. Scientific Reports, 11(1). https://doi.org/10. 1038/s41598-021-93656-0

      Tyagi, S., & Mittal, S. (2020). Sampling approaches for imbalanced data classification problem in machine learning. In P. K. Singh, A. K. Kar, Y. Singh, M. H. Kolekar, & S. Tanwar (Eds.), Proceedings of icric 2019 (pp. 209–221). Springer International Publishing.

      Wang, H., Zhao, J., Li, J., Tian, L., Tu, P., Cao, T., An, Y., Wang, K., & Li, S. (2020). Wearable sensor-based human activity recognition using hybrid deep learning techniques. Security and Communication Networks, 2020, 1–12. https://doi.org/10.1155/2020/ 2132138

    1. eLife Assessment

      This manuscript describes a novel approach for assessing cognitive function in freely moving mice in their home-cage, without human involvement. The authors provide convincing evidence in support of the tasks they developed to capture a variety of complex behaviors and demonstrate the utility of a machine learning approach to expedite the acquisition of task demands. This work is important given its potential utility for other investigators interested in studying mouse cognition. However, additional information (e.g., detailed construction manual, code) is needed to allow other investigators to implement this system independently and use it widely.

    2. Reviewer #1 (Public review):

      Summary:

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory.

      Strengths:

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching.

      Weaknesses:

      I find no major problems with this report.

      Minor weaknesses:

      (1) Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training?

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice.

      Strengths:

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom.

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth.

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals.

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly.

      Minor concerns:

      Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

    4. Reviewer #3 (Public review):

      Summary:

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task.

      Strengths:

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning.

      Weaknesses:

      Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength.

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome.

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory.

      Strengths:

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching.

      Weaknesses:

      I find no major problems with this report.

      (1) Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training?

      Thanks for the thoughtful comment. We would like to clarify the phenomenon described in Line 219: As the training advanced, the number of trials triggered by mice per day decreased (rather than increased as you mentioned in the comment) gradually for both manual and autonomous groups of mice (Fig. 2H left). The performance as you mentioned, improved over time, leading to an increased probability of obtaining water and thus relatively stable daily water intake (Fig. 2H left). We believe the stable daily intake is the minimum amount of water required by the mice under circumstance of autonomous behavioral training.

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better.

      We thank the reviewer for the valuable suggestion. We will address all these points and make the necessary revisions in the next version of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice.

      Strengths:

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom.

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth.

      Thank you for raising this important point. We agree that the requirement for individual housing of mice during the training period is a limitation of our approach, and we appreciate the opportunity to discuss this in more depth. In the revised manuscript, we will add a dedicated section to the Discussion to address this limitation, including the potential impact of individual housing on the mice, the rationale for individual housing in our study, and efforts or alternatives made to mitigate the effects of individual housing.

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals.

      Thank you for your insightful comment regarding the variability in inter-trial intervals and its potential impact on data analysis. We agree that this is an important consideration for continuous self-paced tasks like the autonomous d2AFC paradigm used in our study. In the original manuscript, we have showed the general task engagement across 24-hour cycle (Fig. 2K). The distribution of inter-trial interval was also illustrated (Fig. S3H), which actually shows that most of trials have short intervals (though with extreme long ones). We will include more detailed analysis and discuss the challenges for data analysis.

      Regarding the approaches to mitigate the issue of varying inter-trial interval, we will also discuss strategies to account for and mitigate the effects, including: trial selection, incorporating engagement period (e.g., open only during a fixed 2-hour period each day), etc.

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not.

      Thanks for the reminder. We will add subtitles to the videos in the next version.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly.

      Thanks for this important comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS. Additionally, we have open-sourced all the codes and raw data for all training protocols (https://doi.org/10.6084/m9.figshare.27192897). We will continue to maintain these resources in the future.

      Minor concerns:

      Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

      Thanks for pointing this out. We will make the revision in the next version.

      Reviewer #3 (Public review):

      Summary:

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task.

      Strengths:

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning.

      Weaknesses:

      (1) Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength.

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome.

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Thank you for your insightful comments and for highlighting the importance of considering both within-subject and between-subject questions in cognitive training and testing in rodent models.

      We acknowledge that our study primarily focused on highly controlled within-subject questions. However, the datasets we provided have showed some evidences for the ‘between-subject’ questions. For example, the large variability in learning rates among mice observed in Fig. 2I, the overall learning rate difference between male and female subjects (Fig. 2D vs. Fig. S2G, as the reviewer already mentioned), the varying nocturnal behavioral patterns (Fig. 2K), etc. While our primary focus was on highly controlled within-subjects questions, we recognize the value of exploring between-subjects differences. In the revised version, we will discuss these points more systematically.

      (2) Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      Thank you for the insightful comments. We acknowledge that the extensive training experience, particularly through the algorithmic machine teaching approach, could potentially influence the ability to observe cognitive differences between groups of mice with relevant genetic variants. However, our study design and findings suggest that this approach can still provide valuable insights into individual differences and strategies used by the animals during training. First, the behavioral readout (including learning rate, engagement pattern, etc.) as mentioned above, could tell certain number of differences among mice. Second, detailed modelling analysis (with logistical regression modelling) could further dissect the strategy that mouse use along the training process (Fig. S2B). We have actually highlighted some variables selected by the regression that are associated with individual strategies in performing their tasks (Fig. S2C) and these strategies could be different between manual and autonomous training groups (Fig. S2D). We will discuss these points more in the next version of the manuscript.

      (3) A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

      Thank you for your insightful comments. We agree that the finding that manual training led to significantly faster learning compared to self-paced training is both intriguing and important. One of the possible reasons we think is due to the limited duration of engagement provided by the experimenter in the manual training case, which forced the mice to concentrate more on the trails (thus with fewer omitting trials) than in autonomous training. Your suggestion that experimenter interactions might activate an "occasion setting" process is particularly interesting. In the context of our study, we could actually introduce, for example, a light, serving as the cue that prompt the animals to engage; and when the light is off, the engagement was not accessible any more for the mice to simulate the manual training situation. We agree that this could be an interesting topic for future investigation that might create a more conducive environment for learning, thereby accelerating the learning rate.

    1. eLife Assessment

      The authors have undertaken a useful study to update an existing niche model of highly pathogenic avian influenza. However, there are issues regarding the conceptualisation of the ecological niche of highly pathogenic avian influenza transmission that the modelling aims to capture, raising concerns about the strength of evidence used to support the findings. There are a number of modelling assumptions that are incompletely justified. Combined with shortcomings in the communication, this dilutes the strength of the key findings of this work.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aim to predict ecological suitability for the transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data.

      Strengths:

      The authors follow the established methods of Dhingra et al., 2016 to provide an updated spatial assessment of HPAI transmission suitability for two time periods, pre- and post-2020. They explore further methods of model cross-validation and consider the diversity of the bird species that HPAI has been detected in.

      Weaknesses:

      The precise ecological niche that the authors are modelling here is ambiguous: if we treat the transmission of HPAI in the wild bird population and in poultry populations as separate transmission cycles, linked by spillover events, then these transmission cycles are likely to have fundamentally different ecological niches. While an "index case" in farmed poultry is relevant to the wildlife transmission cycle, further within-farm and farm-to-farm transmission is likely to be contingent on anthropogenic factors, rather than the environment. Similarly, we would expect "index cases" in outbreaks of HPAI in mammals to be relevant to transmission risk in wild birds - this data is not included in this manuscript. Such "index cases" in farmed poultry occur under separate ecological conditions to subsequent transmission in farmed poultry, so should be separated if possible. Some careful editing of the language used in the manuscript may elucidate some of my questions related to model conceptualisation.

      The authors' handling of sampling bias in disease detection data in poultry is possibly inappropriate: one would expect the true spatial distribution of disease surveillance in poultry to be more closely correlated with poultry farming density, in contrast to human population density. This shortcoming in the modelling workflow possibly dilutes a key finding of the Results, that the transmission risk of HPAI in poultry is greatest in areas where poultry farming density is high.

    3. Reviewer #2 (Public review):

      Summary:

      This study aimed to determine which spatial factors (conceived broadly as environmental, agronomic and socio-economic) explain greater avian influenza case numbers reported since 2020 (2020--2022) by comparing similar models built with data from the period 2015--2020. The authors have chosen an environmental niche modelling approach, where detected infections are modelled as a function of spatial covariates extracted at the location of each case. These covariates are available over the entire world so that the predictions can be projected back to space in the form of a continuous map.

      Strengths:

      The authors use boosted regression trees as the main analytical tool, which always feature among the best-performing models for environmental niche models (also known as habitat suitability models). They run replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. The authors take steps to ameliorate some forms of expected bias in the detection of cases, such as geographic variation in surveillance efforts, and in general more detections near areas of higher human population density.

      Weaknesses:

      The study is not altogether coherent with respect to time. Data sets for the response (N5H1 or N5Hx case data in domestic or wild birds ) are divided into two periods; 2015--2020, and 2020--2022. Each set is modelled using a common suite of covariates that are not time-varying. That suggests that causation is inferred by virtue of cases being in different geographic areas in those two time periods. Furthermore, important predictors such as chicken density appear to be informed (in the areas of high risk) from census data from before 2010. The possibility for increased surveillance effort *through time* is overlooked, as is the possibility that previously high-burden locations may implement practice changes to reduce vulnerability.

    4. Author response:

      Reviewer #1:

      Summary:

      The authors aim to predict ecological suitability for the transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data.

      Strengths:

      The authors follow the established methods of Dhingra et al., 2016 to provide an updated spatial assessment of HPAI transmission suitability for two time periods, pre- and post-2020. They explore further methods of model cross-validation and consider the diversity of the bird species that HPAI has been detected in.

      Weaknesses:

      The precise ecological niche that the authors are modelling here is ambiguous: if we treat the transmission of HPAI in the wild bird population and in poultry populations as separate transmission cycles, linked by spillover events, then these transmission cycles are likely to have fundamentally different ecological niches.

      We apologise if this aspect was not clear enough in the previous version of our manuscript but our analyses do not treat or make the assumption of distinct transmission cycles between wild and domestic bird species; those transmission cycles being indeed interconnected by frequent spillover events. Yet, we indeed conduct independent ecological niche modelling analyses to estimate both the ecological suitability for the risk of local circulation in domestic birds as well as the ecological suitability for the risk of local circulation in wild birds. This distinction does not imply that the virus circulates exclusively within one of these populations but rather allows us to identify potential differences in the environmental conditions associated with virus occurrences in each context.

      Our results indicate that these two ecological niche models capture distinct environmental patterns. Virus occurrences in wild birds were primarily associated with factors such as open water and proximity to urban areas, while occurrences in domestic birds were more strongly linked to variables like poultry density and cultivated vegetation. This finding supports the existence of two distinct ecological niches for the virus, corresponding to virus circulation in wild and domestic bird populations. We thank the Reviewer for their feedback and we will take this opportunity to further clarify this aspect in the text.

      While an "index case" in farmed poultry is relevant to the wildlife transmission cycle, further within-farm and farm-to-farm transmission is likely to be contingent on anthropogenic factors, rather than the environment. Similarly, we would expect "index cases" in outbreaks of HPAI in mammals to be relevant to transmission risk in wild birds - this data is not included in this manuscript. Such "index cases" in farmed poultry occur under separate ecological conditions to subsequent transmission in farmed poultry, so should be separated if possible. Some careful editing of the language used in the manuscript may elucidate some of my questions related to model conceptualisation.

      We agree, but index cases are particularly difficult to separate from secondary spread in the absence of field investigation. Identification of index cases based on space-time filtering have been previously investigated but are strongly dependent on the quality of the surveillance, i.e. an “apparent” primary case can be a secondary case of previously undetected ones, and constant surveillance quality cannot be assumed to be homogeneous across countries. Our ecological niche modelling approach is based on HPAI cases reported in the EMPRES-i database, which includes all documented outbreaks without distinguishing primary introductions from subsequent farm-to-farm transmissions. Thus, our ecological niche models are trained on confirmed cases that result from a combination of different transmission dynamics, including introduction events in poultry populations (which can be impacted by ecological factors) and persistence within and between poultry populations (which can be impacted by anthropogenic factors).

      For clarity, we will revise the manuscript to clarify that, while our study primarily aims to assess the environmental suitability for HPAI occurrences, the dataset does not exclude cases resulting from farm-to-farm spread. This means that our models can capture the environmental variables associated with the risk of cases associated with both primary introductions (e.g., spillover from wild birds) and secondary transmission events within poultry systems, although the latter is also influenced by anthropogenic factors such as biosecurity practices and poultry trade networks. These latter factors are not included in our models, which will be highlighted in the limitations (Discussion section) of the revised manuscript.

      In addition, we note the Reviewer's comment regarding the relevance of “index cases” in mammalian outbreaks to understanding the risk of HPAI transmission in wild birds. Although these data are not included in our current study, we will highlight the potential value of incorporating these cases into future models in order to refine risk predictions, provided that they can be identified with some reasonable level of certainty.

      The authors' handling of sampling bias in disease detection data in poultry is possibly inappropriate: one would expect the true spatial distribution of disease surveillance in poultry to be more closely correlated with poultry farming density, in contrast to human population density. This shortcoming in the modelling workflow possibly dilutes a key finding of the Results, that the transmission risk of HPAI in poultry is greatest in areas where poultry farming density is high.

      The Reviewer raises a valid point that poultry surveillance efforts can also be considered as correlated with poultry farm density than with human population density. While human population density can serve as a reasonable proxy for surveillance intensity — given that disease detection is often more active in areas with stronger veterinary notification systems — we acknowledge that poultry disease surveillance can also be influenced by the spatial distribution of poultry farms, as high-density poultry areas could be prioritised for monitoring. Please note that in our study, we followed a previously established approach (Dhingra et al. 2016) and weighted pseudo-absence sampling based on human population density to account for general surveillance biases. However, we do not agree with the Reviewer’s point. In fact, assuming a sampling bias correlated with poultry density would result in reducing its effect as a risk factor. The current approach does not.

      Reviewer #2:

      Summary:

      This study aimed to determine which spatial factors (conceived broadly as environmental, agronomic and socio-economic) explain greater avian influenza case numbers reported since 2020 (2020--2022) by comparing similar models built with data from the period 2015--2020. The authors have chosen an environmental niche modelling approach, where detected infections are modelled as a function of spatial covariates extracted at the location of each case. These covariates are available over the entire world so that the predictions can be projected back to space in the form of a continuous map.

      Strengths:

      The authors use boosted regression trees as the main analytical tool, which always feature among the best-performing models for environmental niche models (also known as habitat suitability models). They run replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. The authors take steps to ameliorate some forms of expected bias in the detection of cases, such as geographic variation in surveillance efforts, and in general more detections near areas of higher human population density.

      Weaknesses:

      The study is not altogether coherent with respect to time. Data sets for the response (N5H1 or N5Hx case data in domestic or wild birds) are divided into two periods; 2015-2020, and 2020-2022. Each set is modelled using a common suite of covariates that are not time-varying. That suggests that causation is inferred by virtue of cases being in different geographic areas in those two time periods. Furthermore, important predictors such as chicken density appear to be informed (in the areas of high risk) from census data from before 2010. The possibility for increased surveillance effort *through time* is overlooked, as is the possibility that previously high-burden locations may implement practice changes to reduce vulnerability.

      We acknowledge the Reviewer's comments regarding the consistency of time periods in our study. Our approach is to divide the HPAI case data into two time periods (2015-2020 and 2020-2022) and ecological niche models using a common set of covariates that do not explicitly account for temporal variation. We will further clarify these aspects in the revised version of our manuscript:

      (1) Our primary objective is to assess changes in ecological suitability over time rather than infer direct causation. By comparing models trained on pre-2020 data with post-2020 occurrences, we evaluate whether pre-2020 environmental conditions can predict recent HPAI suitability. However, we acknowledge that this does not capture dynamic changes in surveillance efforts, biosecurity measures, or host-pathogen interactions over time.

      (2) Regarding predictor variables, we used poultry density data from 2015, rather than pre-2010 data. However, this dataset is not based on a single census year; instead, it represents a median estimate derived from subnational poultry census data collected between 2000 and 2019. This median year approach provides a more stable representation of poultry density than any single-year snapshot. Furthermore, while poultry production systems may exhibit some temporal variation, these changes are generally minor compared to the inter-annual variability observed in HPAI occurrence, which is largely driven by epidemic dynamics. Given the current limitations of global poultry data, distinguishing distributions from different years is not feasible with the available GLW dataset. We will clarify these points in the manuscript.

      (3) We recognise that increased surveillance efforts and adaptive changes in poultry farming practices could influence the observed HPAI case distribution. While our current models do not incorporate time-varying surveillance intensity or biosecurity policies, we will address this limitation in the Discussion section and suggest that future work integrates dynamic surveillance data to improve risk assessments.

    1. eLife Assessment

      This study provides valuable findings on the effects of mating experience on sweet taste perception. The data as presented provide solid evidence that the dopaminergic signaling-mediated reward system underlies this mating state-dependent behavioral modulation. The work will interest neuroscientists, particularly those working on neuromodulation and the effects of internal states on behavior.

    2. Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the subesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the subesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose, yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

    3. Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

    4. Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

    5. Author response:

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, DopR1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that DopR1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest different internal states (courtship failure vs. starvation) modulate peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We will further discuss these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      We think this is a valid concern. We will conduct courtship conditioning in the absence of food and test if courtship failure can still suppress sugar sensitivity in the revised manuscript.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Here is a brief clarification of our experimental design and we will further clarify the details in the revised manuscript:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the response curve).

      On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005). When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding (Figure 1B). For these flies, we could quantify the consumed volumes and found there was no change (Figure 1, S1A). We should also note the consistency of these two experiments, e.g. in Figure 1C, only 50-60% of Failed males responded to 400 mM stimulation.  

      These two experiments in combination suggest that sexual failure suppressed sweet sensitivity of the Failed males. Meanwhile, as long as they still initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      In addition, to further clarify the potential misunderstanding, we plan to examine food consumption by using 800 mM sucrose in the revised manuscript. As shown in Figure 1C, 800 mM sucrose was adequate to induce feeding in ~100% of the flies.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We think this is also a valid suggestion. We will directly examine whether activating TH<sup>+</sup> neurons in sexually conditioned males would enhance sugar responses of Gr5a<sup>+</sup> neurons in sexually failed males. We will also add in statistical analysis.

      Nevertheless, we would still argue our current experiments using Naive males (Figure 3, S1C-D) are adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3, S1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our current data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuity.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We will add detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. In fact, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. However, our current data are not adequate for the clarification given the experiments shown in Figure 6E-F and the apparent control (Figure 3C) were not conducted under identical settings at the same (that’s why we did not directly compare these results). One way to address the issues is to conduct these calcium imaging experiments again with a head-to-head comparison with the control group (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi). We will conduct the experiments and present the data in the revised manuscript.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We will change our expressions in the revised manuscript. In brief, we think that these manipulations (suppressing Dop1R1<sup>+</sup> and Dop2R<sup>+</sup> neurons) have two consequences: suppressing the overall sweet sensitivity and eliminating the effect of sexual failure.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds, Figure 1, S1B-C) upon sexual failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We will further discuss this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Here is a brief clarification of our experimental design and we will further clarify the details in the revised manuscript:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the response curve).

      On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005). When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding (Figure 1B). For these flies, we could quantify the consumed volumes and found there was no change (Figure 1, S1A). We should also note the consistency of these two experiments, e.g. in Figure 1C, only 50-60% of Failed males responded to 400 mM stimulation.  

      These two experiments in combination suggest that sexual failure suppressed sweet sensitivity of the Failed males. Meanwhile, as long as they still initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      In addition, to further clarify the potential misunderstanding, we plan to examine food consumption by using 800 mM sucrose instead. As shown in Figure 1C, 800 mM sucrose was adequate to induce feeding in ~100% of the flies.

    1. eLife Assessment

      This useful study presents a virtual reality-based contextual fear conditioning paradigm for head-fixed mice. The approach provides a way to perform multiphoton imaging of neural circuits during behaviors that have traditionally been studied in freely moving animals. However, evidence supporting key claims is currently incomplete, particularly regarding elicitation and detection of freezing behaviors, and the impact of the study would be increased by articulating what this initial exploration of parameters offers over existing approaches.

    2. Reviewer #1 (Public review):

      The authors set out to develop a contextual fear learning (CFC) paradigm in head-fixed mice that would produce freezing as the conditioned response. Typically, lick suppression is the conditioned response in such designs, but this (1) introduces a potential confounding influence of reward learning on neural assessments of aversion learning and (2) does not easily allow comparison of head-fixed studies with extensive previous work in freely moving animals, which use freezing as the primary conditioned response.

      The first part of this study is a report on the development and outcomes of 3 variations of the CFC paradigm in a virtual reality environment. The fundamental design is strong, with head-fixed mice required to run down a linear virtual track to obtain a water reward. Once trained, the water reward is no longer necessary and mice will navigate virtual reality environments. There are rigorous performance criteria to ensure that mice that make it to the experimental stage show very low levels of inactivity prior to fear conditioning. These criteria do result in only 40% of the mice making it to the experimental stage, but high rates of activity in the VR environment are crucial for detecting learning-related freezing. It is possible that further adjustments to the procedure could improve attrition rates.

      Paradigm versions 1 and 2 vary the familiarity of the control context while paradigm versions 2 and 3 vary the inter-shock interval. Paradigm version 1 is the most promising, showing the greatest increase in conditioned freezing (~40%) and good discrimination between contexts (delta ~15-20%). Paradigm version 2 showed no clear evidence of learning - average freezing at recall day 1 was not different than pre-shock freezing. First-lap freezing showed a difference, but this single-lap effect is not useful for many of the neural circuit questions for which this paradigm is meant to facilitate. Also, the claim that mice extinguished first-lap freezing after 1 day is weak. Extinction is determined here by the loss of context discrimination, but this was not strong to begin with. First-lap freezing does not appear to be different between Recall Day 1 and 2, but this analysis was not done. Paradigm version 3 has some promise, but the magnitude of the context discrimination is modest (~10% difference in freezing). Thus, further optimization of the VR CFC will be needed to achieve robust learning and extinction. This could include factors not thoroughly tested in this study, including context pre-exposure timing and duration and shock intensity and frequency.

      The second part of the study is a validation of the head-fixed CFC VR protocol through the demonstration that fear conditioning leads to the remapping of dorsal CA1 place fields, similar to that observed in freely moving subjects. The results support this aim and largely replicate previous findings in freely moving subjects. One difference from previous work of note is that VR CFC led to the remapping of the control environment, not just the conditioning context. The authors present several possible explanations for this lack of specificity to the shock context, further underscoring the need for further refinement of the CFC protocol before it can be widely applied. While this experiment examined place cell remapping after fear conditioning, it did not attempt to link neural activity to the learned association or freezing behavior.

      In summary, this is an important study that sets the initial parameters and neuronal validation needed to establish a head-fixed CFC paradigm that produces freezing behaviors. In the discussion, the authors note the limitations of this study, suggest the next steps in refinement, and point to several future directions using this protocol to significantly advance our understanding of the neural circuits of threat-related learning and behavior.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Krishnan et al devised three paradigms to perform contextual fear conditioning in head-fixed mice. Each of the paradigms relied on head-fixed mice running on a treadmill through virtual reality arenas. The authors tested the validity of three versions of the paradigms by using various parameters. As described below, I think there are several issues with the way the paradigms are designed and how the data are interpreted. Moreover, as Paradigm 3 was published previously in a study by the same group, it is unclear to me what this manuscript offers beyond the validations of parameters used for the previous publication. Below, I list my concerns point-by-point, which I believe need to be addressed to strengthen the manuscript.

      Major comments

      (1) In the analysis using the LME model (Tables 1 and 2), I am left wondering why the mice had increased freezing across recall days as well as increased generalization (increased freezing to the familiar context, where shock was never delivered). Would the authors expect freezing to decrease across recall days, since repeated exposure to the shock context should drive some extinction? This is complicated by the analysis showing that freeing was increased only on retrieval day 1 when analyzing data from the first lap only. Since reward (e.g., motivation to run) is removed during the conditioning and retrieval tests, I wonder if what the authors are observing is related to decreased motivation to perform the task (mice will just sit, immobile, not necessarily freezing per se). I think that these aspects need to be teased out.

      (2) Related to point 1, the authors actually point out that these changes could be due to the loss of the water reward. So, in line 304, is it appropriate to call this freezing? I think it will be very important for the authors to exactly define and delineate what they consider as freezing in this task, versus mice just simply sitting around, immobile, and taking a break from performing the task when they realize there is no reward at the end.

      (3) In the second paradigm, mice are exposed to both novel and (at the time before conditioning) neutral environments just before fear conditioning. There is a big chance that the mice are 'linking' the memories (Cai et al 2016) of the two contexts such that there is no difference in freezing in the shock context compared to the neutral context, which is what the authors observe (Lines 333-335). The experiment should be repeated such that exposure to the contexts does not occur on the conditioning day.

      (4) On lines 360-361, the authors conclude that extinction happens rapidly, within the first lap of the VR trial. To my understanding, that would mean that extinction would happen within the first 5-10 seconds of the test (according to Figure S1E). That seems far too fast for extinction to occur, as this never occurs in freely behaving mice this quickly.

      (5) Throughout the different paradigms, the authors are using different shock intensities. This can lead to differences in fear memory encoding as well as in levels of fear memory generalization. I don't think that comparisons can be made across the different paradigms as too many variables (including shock intensity - 0.5/0.6mA can be very different from 1.0 mA) are different. How can the authors pinpoint which works best? Indeed, they find Paradigm 3 'works' better than Paradigm 2 because mice discriminate better between the neutral and shock contexts. This can definitely be driven by decreased generalization from using a 0.6mA shock in Paradigm 3 compared to 1.0 mA shock in Paradigm 2.

      (6) There are some differences in the calcium imaging dataset compared to other studies, and the authors should perform additional testing to determine why. This will be integral to validating their head-fixed paradigm(s) and showing they are useful for modeling circuit dynamics/behaviors observed in freely behaving mice. Moreover, the sample size (number of mice) seems low.

      (7) It appears that the authors have already published a paper using Paradigm 3 (Ratigan et al 2023). If they already found a paradigm that is published and works, it is unclear to me what the current manuscript offers beyond that initial manuscript.

      (8) As written, the manuscript is really difficult to follow with the averages and standard error reported throughout the text. This reporting in the text occurred heterogeneously throughout the text, as sometimes it was reported and other times it was not. Cleaning this reporting up throughout the paper would greatly improve the flow of the text and qualitative description of the results.

    4. Reviewer #3 (Public review):

      Summary:

      Krishnan et al. present a novel contextual fear conditioning (CFC) paradigm using a virtual reality (VR) apparatus to evaluate whether conditioned context-induced freezing can be elicited in head-fixed mice. By combining this approach with two-photon imaging, the authors aim to provide high-resolution insights into the neural mechanisms underlying learning, memory, and fear. Their experiments demonstrate that head-fixed mice can discriminate between threat and non-threat contexts, exhibit fear-related behavior in VR, and show context-dependent variability during extinction. Supplemental analyses further explore alternative behaviors and the influence of experimental parameters, while hippocampal neuron remapping is tracked throughout the experiments, showcasing the paradigm's potential for studying memory formation and extinction processes.

      Strengths:

      Methodological Innovation: The integration of a VR-based CFC paradigm with real-time two-photon imaging offers a powerful, high-resolution tool for investigating the neural circuits underlying fear, learning, and memory.

      Versatility and Utility: The paradigm provides a controlled and reproducible environment for studying contextual fear learning, addressing challenges associated with freely moving paradigms.

      Potential for Broader Applications: By demonstrating hippocampal neuron remapping during fear learning and extinction, the study highlights the paradigm's utility for exploring memory dynamics, providing a strong foundation for future studies in behavioral neuroscience.

      Comprehensive Data Presentation: The inclusion of supplemental figures and behavioral analyses (e.g., licking behaviors and variability in extinction) strengthens the manuscript by addressing additional dimensions of the experimental outcomes.

      Weaknesses:

      Characterization of Freezing Behavior: The evidence supporting freezing behavior as the primary defensive response in VR is unclear. Supplementary videos suggest the observed behaviors may include avoidance-like actions (e.g., backing away or stopping locomotion) rather than true freezing. Additional physiological measurements, such as EMG or heart rate, are necessary to substantiate the claim that freezing is elicited in the paradigm.

      Analysis of Extinction: Extinction dynamics are only analyzed through between-group comparisons within each Recall day, without addressing within-group changes in behavior across days. Statistical comparisons within groups would provide a more robust demonstration of extinction processes.

      Low Sample Sizes: Paradigm 1 includes conditions with very low sample sizes (N=1-3), limiting the reliability of statistical comparisons regarding the effects of shock number and intensity. Increasing sample sizes or excluding data from mice that do not match the conditions used in Paradigms 2 and 3 would improve the rigor of the analysis.

      Potential Confound of Water Reward: The authors critique the use of reward in conjunction with fear conditioning in prior studies but do not fully address the potential confound introduced by using water reward during the training phase in their own paradigm.

    1. eLife Assessment

      This study presents a valuable finding regarding how partner preference formation and pair bonding behavior are related to the oxytocin receptor gene expression in the NAc and paraventricular nucleus of the hypothalamus in prairie voles. The evidence supporting this claim is solid but could benefit from clarity in the framing, approach, and results. This study will be of interest to social scientists and neuroscientists who work on pair bonding and oxytocin.

    2. Reviewer #1 (Public review):

      Summary:

      In this remarkable study, the authors use some of their recently-developed oxytocin receptor knockout voles (Oxtr1-/- KOs) to re-examine how oxytocin might influence partner preference. They show that shorter cohabitation times lead to decreased huddling time and partner preference in the KO voles, but with longer periods preference is still established, i.e., the KO animals have a slower rate of forming preference or are less sensitive to whatever cues or experiences lead to the formation of the pair bond as measured by this assay. This helps relate the authors' recent study to the rest of the literature on oxytocin and partner preference in prairie voles. To better understand what might lead to slower partner preference, they quantified changes to the durations and frequency of huddling. In separate assays, they also found that Oxtr1-/- KOs interacted more with stranger males than wild-type females. In a partner choice assay, they found that wild-type males prefer wild-type females more than Oxtr1-/- KO females. They then performed bulk RNA-Seq profiling of nucleus accumbens of both wild-type and Oxtr1-/- KO males and females, either housed with animals of the same sex or paired with a wild-type of the opposite sex. 13 differentially expressed genes were identified, mostly due to downregulation in wild-type females. These genes were also identified in a module lost in the Oxtr1-/- voles by correlated expression profiling. They also compared results of transcriptional profiling in female and male wild-type vs Oxtr1-/- voles (independently of bonding state) and found hundreds of differentially expressed genes in nucleus accumbens, mostly in females and often with some relation to neural development and/or autism. Some of the reduction in the transcript was confirmed with in-situs, as well as compared to changes in transcription in the lateral septum and paraventricular nucleus (PVN) of the hypothalamus. Finally, they find fewer oxytocin+ and AVP+ neurons in the anterior PVN.

      Strengths:

      This is an important study helping to reveal the effects of oxytocin receptor knockout on behavior and gene expression. The experiments are thorough and reveal a surprising number of genetic and anatomical differences, with some sexual dimorphism as well, and the authors have more carefully examined the behavioral changes after shorter and longer periods of partner preference formation.

      Weaknesses:

      It is surprising that given all the genetic changes identified by the authors, the behavioral phenotypes are fairly mild. The extent of gene changes also might be under-reported given the variability in the behavior and relatively low number of animals profiled.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript uses a recently published oxytocin receptor null prairie vole line to examine the effects of this mutation on pair bonding behavior and PVN gene expression. Results reveal that Oxtr sex specifically influences early courtship behavior and partner preference formation as well as suppressing promiscuity toward novel potential mates. PVN gene expression varies between Oxtr null and WT prairie voles.

      Strengths:

      Behavioral analyses extend beyond the typical reporting of frequency and duration. The gene expression models and analyses are well-done and convincing. The experimental designs and approaches are strong.

      Weaknesses:

      More details and background literature explaining the role of the Oxt system in pair bonding behaviors is necessary, particularly for the Introduction. The authors overstate several times that Oxtr expression is not necessary for partner preference formation, based on their previous findings. However, it does appear, particularly, in the short cohabitation that it is necessary. Thus, the nuanced answer may be that Oxt may accelerate partner preference formation. Improving the presentation of the statistics and figures will make the manuscript more reader-friendly.

    1. eLife Assessment

      This multimodal neuroimaging study leverages fMRI, PET, and deep learning to predict memory performance. The authors introduce the brain-cognition gap to link these different imaging modalities to cognition and evaluate their results in two independent cohorts. The results are a valuable addition to the literature and will be of interest to neuroscientists working at the interface of cognition, neuroimaging, and computational modeling. However, the evidence supporting the conclusions remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors attempted to identify whether a new deep-learning model could be applied to both resting and task state fMRI data to predict cognition and dopaminergic signaling. They found that resting state and moving watching conditions best predict episodic memory, but only movie watching predicts both episodic and working memory. A negative 'brain gap' (where the model trained on brain connectivity predicts worse performance than what is actually observed) was associated with less physical activity, poorer cardiovascular function, and lower D1R availability.

      Strengths:

      The paper should be of broad interest to the journal's readership, with implications for cognitive neuroscience, psychiatry, and psychology fields. The paper is very well-written and clear. The authors use two independent datasets to validate their findings, including two of the largest databases of dopamine receptor availability to link brain functional connectivity/activity with neurochemical signaling.

      Weaknesses:

      The deep learning findings represent a relatively small extension/enhancement of knowledge in a very crowded field.

      It's unclear from these results how much utility the brain gaps provide above and beyond observed performance. It would be helpful to take a median split of the dataset on observed performance and plot aside the current Figure 3 results to see how the cardiovascular and physical activity measures differ based on actual performance. Could the authors perform additional analyses describing how much additional variance is explained in these measures by including brain gaps?

      Some of the imaging findings require deeper analysis. For Figure 1f - Which default mode regions have high salience? DMN is a huge network with subregions having differing functions.

      Along the same lines, were the striatal D1R findings regionally specific at all? It would be informative to test whether the three nuclei (Accumbens, Caudate, Putamen) and/or voxelwise models would show something above and beyond what is achieved from averaging D1R across the striatum. What about cortical D1R, which is highly abundant, strongly associated with cognitive (especially WM) performance, and has much unique variance beyond striatal D1R? https://www.science.org/doi/full/10.1126/sciadv.1501672. The PET findings are one of the unique strengths of this paper and are underexplored. It's also unclear if the measure of brain entropy should simply be averaged across all regions.

      It is not clear from the text that the authors met the preconditions for mediation analysis (that is, demonstrating significant correlations between D1R and entropy, in addition to the correlation with brain gap. The authors should report this as well.

      Was age controlled for in the mediation analysis? I would not consider this result valid unless that is the case.

      The discussion section is long, but the authors would do better to replace some less helpful sections (e.g., the paragraph on methodological tweaks to parcellations and model alignment) with a couple of other important points, including:

      (1) Discuss the 'sweet-spot' of movie watching for behavior prediction in the context of studies showing that task states 'quench' neural variability: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007983. This may not be mutually exclusive of the discussion on dopamine and signal-to-noise ratio, but it would be helpful for the authors to discuss their potential overlap vs. unique contributions to the observed findings.

      (2) The argument that dopamine signaling increases signal-to-noise ratio is based on some preclinical data as well as correlational data using fMRI with pharmacological challenges. It is less clear how PET-derived estimates of D1R and D2R availability equate to 'dopamine signaling' as it is thought of in this context. Presumably, based on these data, higher D1R or D2R availability would be related to greater levels of tonic dopaminergic signaling. However, in the case of the COBRA dataset with D2R estimates, those are based on raclopride -- which competes with endogenous dopamine for the D2 receptor. Therefore, someone with higher levels of endogenous dopamine signaling should theoretically have lower raclopride binding and lower D2R estimates. I'm not arguing that the authors' logic is flawed or that D1R and D2R are not good measures of dopamine signaling, but I'd ask the authors to dig into the literature and describe more direct potential links for how greater receptor availability might be associated with greater dopamine signaling (and hence lower entropy). Adding this to the discussion would be very valuable for PET research.

    3. Reviewer #2 (Public review):

      Summary:

      The authors developed a deep learning model based on a DenseNet CNN architecture to predict two cognitive functions: working memory and episodic memory, from functional connectivity matrices. These matrices were recorded under three conditions: during rest, a working memory task, and a movie, and were treated as images for the CNN algorithm. They tested their model's performance across different conditions and a separate dataset with a different age distribution (using the same MRI scanner, scanning configurations, and cognitive tests). They also calculated the "brain cognition gap" based on the model trained on resting functional connectivity to predict working memory. Extending from the commonly used index "brain age," the brain cognition gap was defined as the difference between the working memory score predicted by their model (predicted working memory) and the working memory score based on the working memory test itself (observed working memory). This brain cognition gap was found to be associated with physical activity, education, and cardiovascular risk. The authors also conducted additional mediation tests to examine whether regional functional variability mediated the relationship between PET-derived measures of dopamine and the brain cognition gap.

      Strengths:

      The major strength of this manuscript is the extensive effort the authors have put into creating a new 'biomarker' that links deep learning with fMRI, PET, physical activity, education, and cardiovascular risk across two studies. This effort is impressive.

      Weaknesses:

      There are several weaknesses in the current methods and results, making many of the claims unconvincing. These weaknesses include:

      (1) The lack of baseline models to benchmark the predictive performance of their DenseNet models.

      (2) The inappropriate calculation of the brain cognition gap due to the lack of control for regression-toward-the-mean and the influence of the working memory itself (a common practice in brain age studies).

      (3) The lack of benchmarking of the brain cognition gap against the 'corrected' brain age gap and the direct prediction of physical activity, education, and cardiovascular risk.

      (4) Minimal justification for their PET mediation analysis.

      Regarding the impact of the work on the field and the utility of the methods and data to the community, I see its potential. However, addressing all the weaknesses listed above is crucial and likely to change the conclusions of the results.

      It is important to note that many statements in the manuscript are overstated, making the contribution of the manuscript seem exaggerated.

      For instance, the abstract claims "there is a lack of objective biomarkers to accurately predict cognitive function," and the discussion states, "across various studies, the correlation between predicted and actual fluid intelligence typically hovers around 0.25 (98-100)." However, a meta-analysis by Vieira and colleagues (2022 https://doi.org/10.1016/j.intell.2022.101654) found over 37 studies up to 2020 predicting cognitive abilities from fMRI with machine learning, with 24 studies published in 2019-20 alone. Since 2020, with the rise of machine learning and AI, even more studies have likely been published on this topic, all claiming to show objective biomarkers to accurately predict cognitive function. Vieira and colleagues also found an average performance of these objective biomarkers in predicting general cognition at r = .42, similar to what was found in this manuscript. Based on this alone, it is unclear how novel or superior their method is without a proper systematic benchmark.

      Similarly, the authors claim superior performance of deep learning and mischaracterize machine learning algorithms: "In particular, deep neural networks (DNN) methods have been successfully applied to behavioral and disease prediction (24-26), and have been found to outperform other machine learning approaches (27-29)," and "Deep learning approaches overcome the limitation of predictive techniques that solely rely on linear associations between connectivity and behavioral phenotypes (17)." However, the superiority of deep learning is debatable. Studies show comparable performance between machine learning (such as kernel regression) and deep learning (such as fully-connected neural networks, BrainNetCNN, Graph CNN (GCNN), and temporal CNN), e.g., He and colleagues (2019) and Vieira and colleagues (2024) https://doi.org/10.1016/j.neuroimage.2019.116276 and Vieira and colleagues' https://doi.org/10.1101/2024.03.07.583858.

      Moreover, many non-deep learning predictive techniques are non-linear, e.g., XGBoost, CatBoost, random forest, kernel ridge, and support vector regression with non-linear kernels (such as RBF and polynomial). Thus, stating that machine learning can only model linear relationships is incorrect. Moreover, for the small amount of data the authors had, some might argue that a linear algorithm might be more appropriate to balance the bias-variance trade-off in prediction. Again, without a proper systematic benchmark, it is unclear how well their DenseNet algorithm performs compared to other algorithms.

      Regarding the Brain Age literature, the authors also misinterpreted recent findings: "However, a recent study suggests that brain age predictions contribute minimally compared to chronological age for explaining cognitive decline (65), implying that cognitive predictions are more reliable." In this study, Tetereva and colleagues (2024) (https://doi.org/10.7554/eLife.87297.4) showed that non-deep-learning machine learning can make good predictions from MRI on both chronological age (with r up to .88) and fluid cognition (with r up to .627). Using the combination of functional connectivity matrices across rest and tasks to predict fluid cognition, they found performance at r = .565, comparable to what was found in the current manuscript with deep learning. Nonetheless, while brain age predicted chronological age well (and brain cognition predicted fluid cognition well), it was problematic to predict fluid cognition from brain age. They showed that, because brain age, by design, shared so much common variance with chronological age, brain age and chronological age captured the same variance of fluid cognition. When chronological age was controlled for in the prediction of fluid cognition, brain age no longer had high predictive ability. In the case of the current manuscript, the brain cognition gap is not appropriately controlled for cognition (to be more precise, a working memory score). I expect the performance in predicting physical activity, education, and cardiovascular risk will drop dramatically once cognition is controlled for. There are at least two ways to control cognition according to Tetereva and colleagues' study (see more in the recommendations).

      The authors mentioned, "The third aim of the current study is to uncover the contribution of dopamine (DA) integrity to brain-cognition gaps." However, I fail to see how mediation analysis would test this. The authors also mentioned, "Insufficient DA modulation can affect neurocognitive functions detrimentally (69, 74, 76-78)." They should test if DA levels are related to working memory scores in their study, and if so, whether the relationship is mediated by the "corrected" brain-cognition gaps. Note see more on the recommendation for the calculation of the "corrected" brain-cognition gaps.

    4. Reviewer #3 (Public review):

      Summary:

      This paper by Esmaeili and co-authors presents a connectome prediction study to predict episodic memory and relate prediction errors to other phonotypic variables.

      Strengths:

      (1) A primary and external validation dataset.

      (2) Novel use of prediction errors (i.e., brain-cognitive gap).

      (3) A wide range of data was investigated.

      Weaknesses:

      (1) Lack of comparisons to other methods for prediction.

      (2) Several different points are being investigated that don't allow any particular one to shine through.

      (3) Some choices of analysis are not well-motivated.

      (4) How do the n-back connectomes perform for prediction if the authors do not regress task activations from the n-back task?

      (5) I am a little concerned about overfitting with the convolutional neural net. For example, the drop-off in prediction performance in the external sample is stark. How does the deep learning approach used here compare to something simpler, like a connectome-based predictive model or ridge regression?

      (6) It may be nice to try the other models in the validation dataset. This would also provide a sense of the overfitting that may be going on with overfitting.

      (7) While predictive models increase the power over association studies, they still require large samples to prevent overfitting. Do the authors have a sense of the power their main and external validation sample sizes provide?

      (8) I am not sure that the Mann-Whitney is the correct test for comparing the distributions of prediction performances. The distributions are dependent on each other as they are each predicting the same outcomes. Using the typical degrees of freedom formula would overestimate the degrees of freedom.

      (9) The brain cognition gap is interesting. It is very similar conceptually to the brain age gap. When associating the brain age gap with other phenotypes, typically age is regressed from the brain age gap and the other phenotype. In other words, age is typically associated with a brain age gap as individuals at the tail ages often show the largest gaps. Is the brain cognition gap correlated with episodic memory and do the group differences hold if episodic memory is controlled for?

      (10) I have the same question for the dopamine results. Particularly, in the correlations that are divided by brain cognition gap sign. I could see these types of patterns arise due to a correlation with a third variable.

    1. eLife Assessment

      This is a well-written important paper on the recovery of fauna and flora following the end-Permian extinction event in several continental sites in northern China. The convincing conclusion, a rapid recovery in tropical riparian ecosystems following a short phase of hostile environments and depauperate biota, is supported by an impressive amount of data from sedimentology, body fossils of animals and plants, and especially trace fossils.

    2. Reviewer #1 (Public review):

      Summary:

      This is a very well-written paper presenting interesting findings related to the recovery following the end-Permian event in continental settings, from N China. The finding is timely as the topic is actively discussed in the scientific community. The data provides additional insights into the faunal, and partly, floral global recovery following the EPE, adding to the global picture.

      Strengths:

      The conclusions are supported by an impressive amount of sedimentological and paleontological data (mainly trace fossils) and illustrations.

      Weaknesses:

      The occurrence of MISS (Microbially Induced Sedimentary Structures) could be discussed more in detail as these provide interesting information directly linked to the delayed recovery of the biota.

    3. Reviewer #2 (Public review):

      Summary:

      A rapid recovery of the ecosystems during the late Early Triassic, in the aftermath of the end-Permian mass extinction, is discussed based on different types of fossils.

      Strengths:

      The combined study of invertebrate trace fossils, tetrapod bones, and plant remains together with their stratigraphic distribution in different sections provides a convincing case to support a rapid recovery as the authors hypothesize.

      Weaknesses:

      The study is based on three regions with Triassic successions from the North China block. While a first-hand study of other localities of similar age would be ideal, this is of course a difficult task. Instead, the authors provide comparisons with other worldwide regions to build their case and support the initial hypothesis.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Guo and colleagues features the documentation and interpretation of three successions of continental to marginal marine deposits spanning the P/T transition and their respective ichnofaunas. Based on these new data inferences concerning end-Permian mass extinction and Triassic recovery in the tropical realm are discussed.

      Strengths:

      The manuscript is well-written and organized and includes a large amount of new lithological and ichnological data that illuminate ecosystem evolution in a time of large-scale transition. The lithological documentations, facies interpretations, and ichnotaxonomic assignments look okay (with a few exceptions).

      Weaknesses:

      Some interpretations in Table 1 could be questioned: For facies association FA2 the interpretation as „terrestrial facies with periodical flooding" should be put into the right column and, given the fossil content, other interpretations, such as "marine facies" or "lagoonal environment" with some plant debris and (terrestrial) animal remains washed in, could also be possible. For FA3 the statement "bioturbation is absent" is in conflict with the next statement "strata are moderately reworked". For FA5 the observation of a "monospecific ichnoassemblage" contradicts the listing of several ichnotaxa.

      Concerning the structure of the manuscript, certain hypotheses related to the end-Permian mass extinction and the process of the P/T extinction and recovery, namely the existence of a long-persisting "tropic dead zone" are introduced as a foregone conclusion to which the new data seemingly shall be fit as corroborating evidence. Some of the data - e.g. the presence of a supposedly Smithian-age ichnofauna are interpreted as a fast recovery shortening the duration of the "tropic dead zone" episode - but these interpretations could also be interpreted as contradicting the idea of a "dead zone" sensu stricto in favour of a "normal" post-extinction environment with low diversity and occurrence of typical disaster taxa. Due to their large error bars the early Triassic radiometric ages did not put much of a constraint on the age determination of the earliest post-extinction ichnofaunas discussed here.

      Considering the somewhat equivocal evidence and controversial ideas about the P/T transition, the introduction could be improved by describing how the idea of a "tropic dead zone" arose against the background of earlier ideas, alternative views, and conflicting data. In the discussion section, alternative interpretations of the extensive data presented here - e.g. proximal-distal shifts in lithofacies with respect to the sediment source, sea level changes, preservation bias, the local occurrence of hostile environments instead of a regional scale, etc. should be discussed, also to avoid the impression that the author's conclusion was driven by confirmation bias.

      Contrary to the authors' claim, Figures S7 and S8 suggest that burrow size does not vary much within the studied sections. Size decreases and increases in the Shichuanhe and Liulin sections do not contemporaneously, are usually within the error-bar range, and might be driven by ichnotaxa composition, i.e. the presence or absence of larger ichnotaxa, rather than by size changes in the same ichnotaxon (and producer group). Here the measurement data would be needed as well to check the basis of the authors' interpretations.

      Some arthropod tracks assigned here to Kouphichnium might not represent limulid traces but other (non-marine) arthropod taxa in accordance with their occurrence in terrestrial facies/non-marine units of the succession. More generally, the ichnotaxonomy of arthropod trackways is not yet well reserved - beyond Kouphichnium and Diplichnites various similar-looking types may occur that can have a variety of distinct insect, crustacean, millipede, etc. producers (including larval stages).

    1. eLife Assessment

      The manuscript addresses the 3D chromatin architecture in monocytes from patients with alcohol-associated hepatitis and its relationship to enhanced transcription of innate immune genes. While the concept and methodological approach are appealing, the evidence is incomplete as a result of insufficient sample sizes as well as other significant analytical concerns.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.

      Strengths:

      (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.

      (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.

      Weaknesses:

      (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.

      (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.

      (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.

      (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.

      (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      Appraisal of the Aims and Results:

      The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.

      Impact on the Field and Utility to the Community:

      The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.

      Additional Context:

      The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.

      Conclusion:

      The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.

    3. Reviewer #2 (Public review):

      Summary:

      Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.

      Strengths:

      The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.

      Weaknesses:

      In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.

      Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.

      The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.

      Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

    1. eLife Assessment

      This valuable study combined whole-head magnetoencephalography (MEG) and subthalamic (STN) local field potential (LFP) recordings in patients with Parkinson's disease undergoing deep brain stimulation surgery. The paper provides convincing evidence that cortical and STN beta oscillations are sensitive to movement context.

    2. Reviewer #1 (Public review):

      Summary:

      Winkler et al. present brain activity patterns related to complex motor behaviour by combining whole-head magnetoencephalography (MEG) with subthalamic local field potential (LFP) recordings from people with Parkinson's disease. The motor task involved repetitive circular movements with stops or reversals associated with either predictable or unpredictable cues. Beta and gamma frequency oscillations are described, and the authors found complex interactions between recording sites and task conditions. For example, they observed stronger modulation of connectivity in unpredictable conditions. Moreover, STN power varied across patients during reversals, which differed from stopping movements. The authors conclude that cortex-STN beta modulation is sensitive to movement context, with potential relevance for movement redirection.

      Strengths:

      This study employs a unique methodology, leveraging the rare opportunity to simultaneously record both invasive and non-invasive brain activity to explore oscillatory networks.

      Weaknesses:

      It is difficult to interpret the role of the STN in context of reversals, because no consistent activity pattern emerged.

      Comments on revisions: The authors have adequately addressed my comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines the role of beta oscillations in motor control, particularly during rapid changes in movement direction among patients with Parkinson's disease. The researchers utilized magnetoencephalography (MEG) and local field potential (LFP) recordings from the subthalamic nucleus to investigate variations in beta band activity within the cortex and STN during the initiation, cessation, and reversal of movements, as well as the impact of external cue predictability on these dynamics. The primary finding indicates that beta oscillations more effectively signify the start and end of motor sequences than transitions within those sequences. The article is well-written, clear, and concise.

      Strengths:

      The use of a continuous motion paradigm with rapid reversals extends the understanding of beta oscillations in motor control beyond simple tasks. It offers a comprehensive perspective on subthalamo-cortical interactions by combining MEG and LFP.

      Comments on revisions: I am satisfied with the revisions. I do not have further comments on the revised manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      The study highlights how the initiation, reversal, and cessation of movements are linked to changes in beta synchronization within the basal ganglia-cortex loops. It was observed that different movement phases, such as starting, stopping briefly, and stopping completely, affect beta oscillations in the motor system.

      It was found that unpredictable cues lead to stronger changes in STN-cortex beta coherence. Additionally, specific patterns of beta and gamma oscillations related to different movement actions and contexts were observed. Stopping movements was associated with a lack of the expected beta rebound during brief pauses within a movement sequence.

      Overall, the results underline the complex and context-dependent nature of motor control and emphasize the role of beta oscillations in managing movement according to changing external cues.

      Strengths:

      The paper is very well written, clear and appears methodologically sound.

      Although the use of continuous movement (turning) with reversals is more naturalistic than many previous button push paradigms.

      Weaknesses:

      The generalizability of the findings are somewhat curtailed by the fact that this was performed peri-operatively during the period of the microlesion effect. Given the availability of sensing enabled DBS devices now and HD-EEG, does MEG offer a significant enough gain in spatial localizability to offset the fact that it has to be done shortly postoperatively with externalized leads, with attendant stun effect? Specifically, for paradigms that are not asking very spatially localized questions as a primary hypothesis?

      Further investigation of the gamma signal seems warranted, even though it has a slightly lower proportional change in amplitude in beta. Given that the changes in gamma here are relatively wide band, this could represent a marker of neural firing that could be interestingly contrasted against the rhythm account presented.

      Comments on revisions: I congratulate the authors on their paper and their revisions and I have no further comments. I look forward to seeing the continuous analyses in the future. Good luck!

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study combined whole-head magnetoencephalography (MEG) and subthalamic (STN) local field potential (LFP) recordings in patients with Parkinson's disease undergoing deep brain stimulation surgery. The paper provides solid evidence that cortical and STN beta oscillations are sensitive to movement context and may play a role in the coordination of movement redirection.

      We are grateful for the expert assessment by the editor and the reviewers. Below we provide pointby-point replies to both public and private reviews. We have tried to keep the answers in the public section short and concise, not citing the changed passages unless the point does not re-appear in the recommendations. There, we did include all of the changes to the manuscript, such that the reviewers need not go back and forth between replies and manuscript.

      The reviewer comments have not only led to numerous improvements of the text, but also to new analyses, such as Granger causality analysis, and to methodological improvements e.g. including numerous covariates in the statistical analyses. We believe that the article improved substantially through the feedback, and we thank the reviewers and the editor for their effort.

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      Winkler et al. present brain activity patterns related to complex motor behaviour by combining wholehead magnetoencephalography (MEG) with subthalamic local field potential (LFP) recordings from people with Parkinson's disease. The motor task involved repetitive circular movements with stops or reversals associated with either predictable or unpredictable cues. Beta and gamma frequency oscillations are described, and the authors found complex interactions between recording sites and task conditions. For example, they observed stronger modulation of connectivity in unpredictable conditions. Moreover, STN power varied across patients during reversals, which differed from stopping movements. The authors conclude that cortex-STN beta modulation is sensitive to movement context, with potential relevance for movement redirection.

      Strengths:

      This study employs a unique methodology, leveraging the rare opportunity to simultaneously record both invasive and non-invasive brain activity to explore oscillatory networks.

      Weaknesses:

      It is difficult to interpret the role of the STN in the context of reversals because no consistent activity pattern emerged.

      We thank the reviewer for the valuable feedback to our study. We agree that the interpretation of the role of the STN during reversals is rather difficult, because reversal-related STN activity was highly variable across patients. Although there seem to be consistent patterns in sub-groups of the current cohort, with some patients showing event-related increases (Fig. 3b) and others showing decreases, the current dataset is not large enough to substantiate or even explain the existence of such clusters. Thus, we limit ourselves to acknowledging this limitation and discussing potential reasons for the high variability, namely variability in electrode placement and insufficient spatial resolution for the separation of specialized cell ensembles within the STN (see Discussion, section Limitations and future directions).

      Reviewer #2 (Public review):

      Summary:

      This study examines the role of beta oscillations in motor control, particularly during rapid changes in movement direction among patients with Parkinson's disease. The researchers utilized magnetoencephalography (MEG) and local field potential (LFP) recordings from the subthalamic nucleus to investigate variations in beta band activity within the cortex and STN during the initiation, cessation, and reversal of movements, as well as the impact of external cue predictability on these dynamics. The primary finding indicates that beta oscillations more effectively signify the start and end of motor sequences than transitions within those sequences. The article is well-written, clear, and concise.

      Strengths:

      The use of a continuous motion paradigm with rapid reversals extends the understanding of beta oscillations in motor control beyond simple tasks. It offers a comprehensive perspective on subthalamocortical interactions by combining MEG and LFP.

      Weaknesses:

      (1) The small and clinically diverse sample size may limit the robustness and generalizability of the findings. Additionally, the limited exploration of causal mechanisms reduces the depth of its conclusions and focusing solely on Parkinson's disease patients might restrict the applicability of the results to broader populations.

      We thank the reviewer for the insightful feedback. We address these issues one by one in our responses to points 2, 4 and 6, respectively.

      (2) The small sample size and variability in clinical characteristics among patients may limit the robustness of the study's conclusions. It would be beneficial for the authors to acknowledge this limitation and propose strategies for addressing it in future research. Additionally, incorporating patient-specific factors as covariates in the ANOVA could help mitigate the confounding effects of heterogeneity.

      Thank you for this comment. The challenges associated with recording brain activity peri-operatively can be a limiting factor when it comes to sample size and cohort stratification. We now acknowledge this in the revised discussion (section Limitations and future directions). Furthermore, we suggest using sensing-capable devices in the future as a measure to increase sample sizes (Discussion, section Limitations and future directions). Lastly, we appreciate the idea of adding patient-specific factors as covariates to the ANOVAs and have thus included age, disease duration and pre-surgical UPDRS score into our models. This did not lead to any qualitative changes of statistical effects.

      (3) The author may consider using standardized statistics, such as effect size, that would provide a clearer picture of the observed effect magnitude and improve comparability.

      Thanks for the suggestion. As measures of effect size, we have added partial eta squared (η<sub>p</sub><sup2</sup>) to the results of all ANOVAs and Cohen’s d to all follow-up t-tests.

      (4) Although the study identifies relevance between beta activity and motor events, it lacks causal analysis and discussion of potential causal mechanisms. Given the valuable datasets collected, exploring or discussing causal mechanisms would enhance the depth of the study.

      We appreciate this idea and have conducted Granger causality analyses in response to this comment. This new analysis reveals that there is a strong cortical drive to the STN for all movements of interest and predictability conditions in the beta band. The detailed results can be viewed on p. 16 in the section on Granger causality. For statistical testing, we conducted an rmANCOVA, similar to those for power and coherence (see p. 46-48 and 54-56 for the corresponding tables), as well as t-tests assessing directionality (Figure 6-figure supplement 2 on p. 35). In the discussion section, we connect these results with prior findings suggesting that the frontal cortex drives the STN in the beta band, likely through hyperdirect pathway fibers (p. 17).

      (5) The study cohort focused on senior adults, who may exhibit age-related cortical responses during movement planning in neural mechanisms. These aspects were not discussed in the study.

      We appreciate the comment and agree that age may have impacted neural oscillatory activity of patients in the present study. We now acknowledge this in the limitations section, and point out that our approach to handling these effects was including age as a covariate in the statistical analyses.

      (6) Including a control group of patients with other movement disorders who also undergo DBS surgery would be beneficial. Because we cannot exclude the possibility that the observed findings are specific to PD or can be generalized. Additionally, the current title and the article, which are oriented toward understanding human motor control, may not be appropriate.

      We thank the reviewer for this comment and fully agree that it cannot be ruled out that the present findings are, in part, specific to PD. We acknowledge this limitation in the Limitations and future directions section (p. 20-21). Indeed, including a control group of patients with other disorders would be ideal, but the scarcity of patients with diseases other than PD who receive STN DBS in our centre makes this an unfeasible option in practical terms. We do suggest that future research may address this issue by extending our approach to different disorders or healthy participants on the cortical level (p. 21). Lastly, we appreciate the idea to adjust the title of the present article. The adjusted title is: “Context-Dependent Modulations of Subthalamo-Cortical Synchronization during Rapid Reversals of Movement Direction in Parkinson’s Disease”.

      That being said, we do believe that our findings at least approximate healthy functioning and are not solely related to PD. For one, patients were on their usual dopaminergic medication and dopamine has been found to normalize pathological alterations of beta activity. Further, the general pattern of movement-related beta and gamma oscillations reported here has been observed in numerous diseases and brain structures, including cortical beta oscillations measured non-invasively in healthy participants.

      Reviewer #3 (Public review):

      Summary:

      The study highlights how the initiation, reversal, and cessation of movements are linked to changes in beta synchronization within the basal ganglia-cortex loops. It was observed that different movement phases, such as starting, stopping briefly, and stopping completely, affect beta oscillations in the motor system.

      It was found that unpredictable cues lead to stronger changes in STN-cortex beta coherence. Additionally, specific patterns of beta and gamma oscillations related to different movement actions and contexts were observed. Stopping movements was associated with a lack of the expected beta rebound during brief pauses within a movement sequence.

      Overall, the results underline the complex and context-dependent nature of motor-control and emphasize the role of beta oscillations in managing movement according to changing external cues.

      Strengths:

      The paper is very well written, clear, and appears methodologically sound.

      Although the use of continuous movement (turning) with reversals is more naturalistic than many previous button push paradigms.

      Weaknesses:

      The generalizability of the findings is somewhat curtailed by the fact that this was performed perioperatively during the period of the microlesion effect. Given the availability of sensing-enabled DBS devices now and HD-EEG, does MEG offer a significant enough gain in spatial localizability to offset the fact that it has to be done shortly postoperatively with externalized leads, with an attendant stun effect? Specifically, for paradigms that are not asking very spatially localized questions as a primary hypothesis?

      We appreciate the reviewer’s feedback and acknowledge the valid point raised on the timing of our measurements. Indeed, sensing-enabled devices offer a valid alternative to peri-operative recordings, circumventing the stun effect. We acknowledge this in the revised discussion, section Limitations and future directions (p. 23): “Additionally, future research could capitalize on sensingcapable devices to circumvent the necessity to record brain activity peri-operatively, facilitating larger sample sizes and circumventing the stun effect, an immediate improvement in motor symptoms arising as a consequence of electrode implantation (Mann et al., 2009).” This alternative strategy, however, was not an option here because we did not have a sufficient number of patients implanted with sensing-enabled devices at the time when the data collection was initialized.

      That being said, we would like to highlight that in the present study, our goal was not to study pathology related to Parkinson’s disease. Rather, we aimed to learn about motor control in general. The stun effect may have facilitated motor performance in our patients, which is actually beneficial to the research goals at hand.

      Further investigation of the gamma signal seems warranted, even though it has a slightly lower proportional change in amplitude in beta. Given that the changes in gamma here are relatively wide band, this could represent a marker of neural firing that could be interestingly contrasted against the rhythm account presented.

      We appreciate the reviewer’s interest and we have extended the investigation of gamma oscillations. We now provide statistics regarding the influence of predictability on gamma power and gamma coherence (no significant effects) and explore Granger causality in the gamma (and beta) band (see comment 4 of reviewer 2). Unfortunately, we cannot measure spiking via the DBS electrode, and therefore we cannot investigate correlations between gamma oscillatory activity and action potentials. We do agree with the reviewer, however, that action potentials rather than oscillations form the basis of motor control in the brain. This view of ours is now reflected in the revised discussion, section Limitations and future directions (p. 21): “Lastly, given the present study’s focus on understanding movement-related rhythms, particularly in the beta range, future research could further explore the role of gamma oscillations in continuous movement and their relation to action potentials in motor areas (Fischer et al., 2020; Igarashi, Isomura, Arai, Harukuni, & Fukai, 2013), which form the basis of movement encoding in the brain.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is a well-conducted study and overall the results are clear. I only have one minor suggestion for improvement of the manuscript. I found the order of appearance of the results somewhat confusing, switching from predictability-related behavioral effects to primarily stopping and reversal-related neurophysiological effects, back to predictability but starting with coherence. I would suggest that the authors try to follow a systematic order focused on the questions at hand. E.g. perhaps readability could be improved if the results section is split into reversal vs. stopping related effects, reporting behavior, power, and coherence in this order, followed by a predictability section, again reporting behavior, power, and coherence. Obviously, this is an optional suggestion. Apart from that, I just missed a more direct message related to the absence of statistical significance related to STN power changes during reversal. I think this could be made more clear in the text.

      We thank the reviewer for the feedback to our study. In order to ease reading, we modified the order and further added additional sub-titles to the results section. We start with Behavior (p. 4) and then move on to Power (general movement effects on power – movement effects on STN power – movement effects on cortical power – predictability effects on power). Next, we move on to Connectivity (movement effects on connectivity – predictability effects on connectivity – Granger causality). We hope that these adaptations will help guide the reader.

      Additionally, we thank the reviewer for noting that we did not explicitly mention the lack of statistical significance of reversal-related beta power modulations in the STN. We have adapted the section on modulation of STN beta power associated with reversals (p. 8) to: “In the STN, reversals were associated with a brief modulation of beta power, which was weak in the group-average spectrum and did not reach significance (Fig. 3a).”

      Reviewer #2 (Recommendations for the authors):

      (1) The small sample size and variability in clinical characteristics among patients may limit the robustness of the study's conclusions. It would be beneficial for the authors to acknowledge this limitation and propose strategies for addressing it in future research. Additionally, incorporating patient-specific factors as covariates in the ANOVA could help mitigate the confounding effects of heterogeneity.

      Thank you for this comment. The challenges associated with recording brain activity peri-operatively can be a limiting factor when it comes to sample size. We now acknowledge this in the revised discussion, section Limitations and future directions (p. 20):

      “Invasive measurements of STN activity are only possible in patients who are undergoing or have undergone brain surgery. Studies drawing from this limited pool of candidate participants are typically limited in terms of sample size and cohort stratification, particularly when carried out in a peri-operative setting. Here, we had a sample size of 20, which is rather high for a peri-operative study, but still low in terms of absolute numbers.”

      Furthermore, we suggest using sensing-capable devices in the future as a measure to increase sample sizes (p. 21):

      “Additionally, future research could capitalize on sensing-capable devices to circumvent the necessity to record brain activity peri-operatively, facilitating larger sample sizes and circumventing the stun effect, an immediate improvement in motor symptoms arising as a consequence of electrode implantation (Mann et al., 2009).”

      Lastly, we appreciate the idea of adding patient-specific factors as covariates to the ANOVAs and have thus included age, disease duration and pre-surgical UPDRS score into our models. This did not lead to any qualitative changes of statistical effects.

      Revised article

      Methods, Statistical analysis:

      “To account for their potential influence on brain activity, we added age, pre-operative UPDRS score, and disease duration as covariates to all ANOVAs. Covariates were standardized by means of zscoring.”

      (2) The author may consider using standardized statistics, such as effect size, that would provide a clearer picture of the observed effect magnitude and improve comparability.

      Thanks for this useful suggestion. As measures of effect size, we have added partial eta squared (η<sub>p</sub><sup2</sup>) to the results of all ANOVAs and Cohen’s d to all follow-up _t-_tests.

      (3) Although the study identifies relevance between beta activity and motor events, it lacks causal analysis and discussion of potential causal mechanisms. Given the valuable datasets collected, exploring or discussing causal mechanisms would enhance the depth of the study.

      We appreciate this idea and have conducted Granger causality analyses in response to this comment. This new analysis reveals that there is a strong cortical drive to the STN for all movements of interest and predictability conditions in the beta band, but no directed interactions in the gamma band. For statistical testing, we conducted an rmANCOVA, similar to the analysis of power and coherence (see p. 46-48 and 54-56 for the corresponding tables), as well as t-tests assessing directionality (Figure 6 figure supplement 2 on p. 35). In the discussion section, we connect these results with prior findings suggesting that the frontal cortex drives the STN in the beta band, likely through hyperdirect pathway fibers (p. 17).

      Revised article

      Methods Section, Granger Causality Analysis

      “We computed beta and gamma band non-parametric Granger causality (Dhamala, Rangarajan, & Ding, 2008) between cortical ROIs and the STN in the hemisphere contralateral to movement for the post-event time windows (0 – 2 s with respect to start, reversal, and stop). Because estimates of Granger causality are often biased, we compared the original data to time-reversed data to suppress non-causal interactions. True directional influence is reflected by a higher causality measure in the original data than in its time-reversed version, resulting in a positive difference between the two, the opposite being the case for a signal that is “Granger-caused” by the other. Directionality is thus reflected by the sign of the estimate (Haufe, Nikulin, Müller, & Nolte, 2013). Because rmANCOVA results indicated no significant effects for predictability and movement type, and post-hoc tests did not detect significant differences between hemispheres, we averaged Granger causality estimates over movement types, hemispheres and predictability conditions in Figure 6-figure supplement 2.”

      Results, Granger causality

      “In general, cortex appeared to drive the STN in the beta band, regardless of the movement type and predictability condition. This was reflected in a main effect of ROI on Granger causality estimates (F<sub>ROI</sub>(7,9) = 3.443, p<sub>ROI</sub> = 0.044, η<sub>p</sub><sup2</sup> = 0.728; refer to Supplementary File 4 for the full results of the ANOVA). In the hemisphere contralateral to movement, follow-up t-tests revealed significantly higher Granger causality estimates from M1 to the STN (t = 3.609, one-sided p < 0.001, d = 0.807) and from MSMC to the STN (t = 2.051, one-sided p < 0.027, d = 0.459) than the other way around. The same picture emerged in the hemisphere ipsilateral to movement (M1 to STN: t = 3.082, one-sided p = 0.003, d = 0.689; MSMC to STN: t \= 1.833, one-sided p < 0.041, d = 0.410). In the gamma band, we did not detect a significant drive from one area to the other (F<sub>ROI</sub>(7,9) = 0.338, p<sub>ROI</sub> = 0.917, η<sub>p</sub><sup2</sup> = 0.208, Supplementary File 6). Figure 6-figure supplement 2 demonstrates the differences in Granger causality between original and time-reversed data for the beta and gamma band.”

      Discussion, The dynamics of STN-cortex coherence

      “Considering the timing of the increase observed here, the STN’s role in movement inhibition (Benis et al., 2014; Ray et al., 2012) and the fact that frontal and prefrontal cortical areas are believed to drive subthalamic beta activity via the hyperdirect pathway (Chen et al., 2020; Oswal et al., 2021) it seems plausible that the increase of beta coherence reflects feedback of sensorimotor cortex to the STN in the course of post-movement processing. In line with this idea, we observed a cortical drive of subthalamic activity in the beta band.”

      (4) The study cohort focused on senior adults, who may exhibit age-related cortical responses during movement planning in neural mechanisms. These aspects were not discussed in the study.

      We appreciate the comment and agree that age may have impacted neural oscillatory activity of patients in the present study. We now acknowledge this in the limitations section, and point out that our approach to handling these effects was including age as a covariate in the statistical analyses.

      Revised article

      Discussion, Limitations and Future Directions

      “Further, most of our participants were older than 60 years. To diminish any confounding effects of age on movement-related modulations of neural oscillations, such as beta suppression and rebound (Bardouille & Bailey, 2019; Espenhahn et al., 2019), we included age as a covariate in the statistical analyses.”

      (5) Including a control group of patients with other movement disorders who also undergo DBS surgery would be beneficial. Because we cannot exclude the possibility that the observed findings are specific to PD or can be generalized. Additionally, the current title and the article, which are oriented toward understanding human motor control, may not be appropriate.

      We thank the reviewer for this comment and fully agree that it cannot be ruled out that the present findings are, in part, specific to PD. We acknowledge this limitation in the Limitations and future directions section (p. 20-21). Indeed, including a control group of patients with other disorders would be ideal, but the scarcity of patients with diseases other than PD who receive STN DBS makes this an unfeasible option. We do suggest that future research may address this issue by extending our approach to different disorders or healthy participants on the cortical level (p. 21). Lastly, we appreciate the idea to adjust the title of the present article. The adjusted title is: “Context-Dependent Modulations of Subthalamo-Cortical Synchronization during Rapid Reversals of Movement Direction in Parkinson’s Disease”.

      That being said, we do believe that our findings at least approximate healthy functioning and are not solely related to PD. For one, patients were on their usual dopaminergic medication for the study and dopamine has been found to normalize pathological alterations of beta activity. More importantly, the general pattern of movement-related beta and gamma oscillations has been observed in numerous diseases and brain structures, including cortical beta oscillations measured non-invasively in healthy participants. Thus, it is not unlikely that the new aspects discovered here are also general features of motor processing.

      Revised article

      Discussion, Limitations and future directions

      “Furthermore, we cannot be sure to what extent the present study’s findings relate to PD pathology rather than general motor processing. We suggest that our approach at least approximates healthy brain functioning as patients were on their usual dopaminergic medication. Dopaminergic medication has been demonstrated to normalize power within the STN and globus pallidus internus, as well as STN-globus pallidus internus and STN-cortex coherence (Brown et al., 2001; Hirschmann et al., 2013). Additionally, several of our findings match observations made in other patient populations and healthy participants, who exhibit the same beta power dynamics at movement start and stop (Alegre et al., 2004) that we observed here. Notably, our finding of enhanced cortical involvement in face of uncertainty aligns well with established theories of cognitive processing, given the cortex' prominent role in managing higher cognitive functions (Altamura et al., 2010). Yet, transferring our approach and task to patients with different disorders, e.g. obsessive compulsive disorder, or examining young and healthy participants solely at the cortical level, could contribute to elucidating whether the synchronization dynamics reported here are indeed independent of PD and age.”

      Reviewer #3 (Recommendations for the authors):

      Despite the strengths of the "rhythm" account of cognitive processes, the paper could possibly be improved by making it less skewed to rhythms explaining all of the movement encoding.

      Thank you for this comment - the point is well taken. There is a large body of literature relating neural oscillations to spiking in larger neural populations, which itself is likely the most relevant signal with respect to motor control. In our eyes, it is this link that justifies the rhythm account, i.e. we agree with the reviewer that action potentials are the basis of movement encoding in the brain, not oscillations. Unfortunately, we cannot measure spiking with the method at hand.

      To better integrate this view into the current manuscript, we make the following suggestion for future research in the Limitations and future directions section (p. 21): “Lastly, given the present study’s focus on understanding movement-related rhythms, particularly in the beta range, future research could further explore the role of gamma oscillations in continuous movement and their relation to action potentials in motor areas (Fischer et al., 2020; Igarashi, Isomura, Arai, Harukuni, & Fukai, 2013), which form the basis of movement encoding in the brain.”

      In Figure 5 - is the legend correct? Is it really just a 0.2% change in power only? That would be a very surprisingly small effect size.

      We thank the reviewer for noting this. Indeed, the numbers on the scale quantify relative change (post - pre)/pre and should be multiplied by 100 to obtain %-change. We have adjusted the color bars accordingly.

      The dissociation between the effects of unpredictable cues in coherence versus raw power is interesting and could potentially be directly contrasted further in the discussion (here they are presented separately with separate discussions, but this seems like a pretty important and novel finding as beta coherence and power usually go in the same direction).

      We appreciate the reviewer’s interest in our findings on the predictability of movement instructions. In case of coherence, the difference between pre- and post-event was generally more positive in the unpredictable condition, meaning that suppressions (negative pre-post difference) were diminished whereas increases (positive pre-post difference) were enhanced. With respect to power, we also observed less suppression in the unpredictable condition at movement start. Therefore, the direction of change is in fact the same. We made this clearer in the revised version by adapting the corresponding sections of the abstract, results and discussion (see below).

      The only instance of coherence and power diverging (on a qualitative level) was observed during reversals: here, we noted post-event increases in coherence and post-event decreases in M1 power in the group-average spectra. However, when comparing the pre- and post-event epochs statistically by means of permutation testing, the coherence increase did not reach significance. Hence, we did not highlight this aspect.

      Revised version

      Abstract

      “… Event-related increases of STN-cortex beta coherence were generally stronger in the unpredictable than in the predictable condition. … “

      Results, Effects of predictability on beta power  

      “With respect to the effect of predictability of movement instructions on beta power dynamics (research aim 2), we observed an interaction between movement type and condition (F<sub>cond*mov</sub> (2,14) = 4.206, p<sub>cond*mov</sub> = 0.037, η<sub>p</sub><sup2</sup> = 0.375), such that the beta power suppression at movement start was generally stronger in the predictable (M = -0.170, SD = 0.065) than in the unpredictable (M \= -0.154, SD = 0.070) condition across ROIs (t = -1.888, one-sided p \= 0.037, d = -0.422). We did not observe any modulation of gamma power by the predictability of movement instructions (F<sub>cond</sub> (1,15) = 0.792, p<sub>cond</sub> = 0.388, η<sub>p</sub><sup2</sup> = 0.050, Supplementary File 5).”

      Effects of predictability on STN-cortex coherence

      “With respect to the effect of predictability of movement instructions on beta coherence (research aim 2), we found that the pre-post event differences were generally more positive in the unpredictable condition (main effect of predictability condition; F<sub>cond</sub>(1,15) = 8.684, p<sub>cond</sub> = 0.010, η<sub>p</sub><sup2</sup> = 0.367; Supplementary File 3), meaning that the suppression following movement start was diminished and the increases following stop and reversal were enhanced in the unpredictable condition (Fig. 6a). This effect was most pronounced in the MSMC (Fig. 6b). When comparing regionaverage TFRs between the unpredictable and the predictable condition, we observed a significant difference only for stopping (t<sub>clustersum</sub> = 142.8, p = 0.023), suggesting that the predictability effect was mostly carried by increased beta coherence following stops. When repeating the rmANCOVA for preevent coherence, we did not observe an effect of predictability (F<sub>cond</sub>(1,15) = 0.163, p<sub>cond</sub> = 0.692, η<sub>p</sub><sup2</sup> = 0.011), i.e. the effect was most likely not due to a shift of baseline levels. The increased tendency for upward modulations and decreased tendency for downward modulations rather suggests that the inability to predict the next cue prompted intensified event-related interaction between STN and cortex. STN-cortex gamma coherence was not modulated by predictability (F<sub>cond</sub>(1,15) = 0.005, p<sub>cond</sub> = 0.944, η<sub>p</sub><sup2</sup> = 0.000, Supplementary File 5).”

      Discussion, Beta coherence and beta power are modulated by predictability

      “In the present paradigm, patients were presented with cues that were either temporally predictable or unpredictable. We found that unpredictable movement prompts were associated with stronger upward modulations and weaker downward modulations of STN-cortex beta coherence, likely reflecting the patients adopting a more cautious approach, paying greater attention to instructive cues. Enhanced STN-cortex interactions might thus indicate the recruitment of additional neural resources, which might have allowed patients to maintain the same movement speed in both conditions. […]”

      With respect to power, we observed reduced beta suppression in the unpredictable condition at movement start, consistent with the effect on coherence, likely demonstrating a lower level of motor preparation.

      Given that you have a nice continuous data task here - the turning of the wheel, it might be interesting to cross-correlate the circular position (and separately - velocity) of the turning with the envelope of the beta signal. This would be a nice finding if you could also show that the beta is modulated continuously by the continuous movements. In the natural world, we rarely do a continuous movement with a sudden reversal, or stop, most of the time we are in continuous movement. Look at this might also be a strength of your dataset.

      We could not agree more. In fact, having a continuous behavioral output was a major motivation for choosing this particular task. We are very interested in state space models such as preferential subspace identification (Sani et al., 2021), for example. These models relate continuous brain signals to continuous behavioral target variables and should be of great help for questions such as: do oscillations relate to moment-by-moment adaptations of continuous movement? Which frequency bands and brain areas are important? Is angular position encoded by different brain areas/frequency bands than angular speed? These analyses are in fact ongoing. This project, however, is too large to fit into the current article.

    1. eLife Assessment

      This important study by Wong et al. addresses a longstanding question in the field of associative learning regarding how a motivationally relevant event can be inferred from prior learning based on neutral stimulus-stimulus associations. The research provides convincing behavioral and neurophysiological evidence to address this question. The manuscript will be interesting for researchers in behavioral and cognitive neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study is an important follow-up to their prior work - Wong et al. (2019), starting with clear questions and hypotheses, followed by a series of thoughtful and organized experiments. The method and results are convincing. Experiment 1 demonstrated the sensory preconditioned fear with few (8) or many (32) sound-light pairings. Experiments 2A and 2B showed the role of PRh NMDA receptors during conditioning for online integration, revealing that this contribution is present only after few sound-light pairings, not after many sound-light pairings. Experiments 3A and 3B showed the contribution of PRh-BLA communication to online integration, again only after few but not after many. Contrary to Experiments 3A and 3B, Experiments 4A and 4B showed the contribution of PRh-BLA communication to integration at test only after many but not few sound-light pairings.

      Strengths:

      Throughout the manuscript, the methods and results are clearly organized and described, and the use of statistics is solid, all contributing to the overall clarity of the research. The discussion section was also well written, effectively comparing the current research with the prior work and offering insightful interpretations and potential future directions for this line of research.

      All my previous concerns have been well addressed in this revised version. I do not have further concerns about the current version of this manuscript.

    3. Reviewer #2 (Public review):

      This manuscript builds on the authors' earlier work, most recently Wong et al. 2019, in which they showed the importance of the perirhinal cortex (PRh) during the first-order conditioning stage of sensory preconditioning. Sensory preconditioning requires learning between two neutral stimuli (S2-S1) and subsequent development of a conditioned response to one of the neutral stimuli after pairing of the other stimulus with a motivationally relevant unconditioned stimulus (S1-US). One highly debated question regarding the mechanisms of learning of sensory preconditioning has been whether conditioned responses evoked by the indirectly trained stimulus (S2) occur through a mediated representation at the time of the first-order US training, or whether the conditioned responses develop through a chained evoked representation (S2--> S1 --> US) at the time of test. The authors' prior findings provided strong evidence for PRh being involved in mediated learning during the first-order training. They showed that protein synthesis was required during the first-order S1-US learning to support the conditioned response to the indirectly trained stimulus (S2) at test.

      One question remaining following the previous paper was whether certain conditions may promote a chaining mechanism over mediated learning, as there is some evidence for chained representations at the time of test. In this paper, the authors directly address this important question and find unambiguous results that the extent of training during the preconditioning stage impacts the involvement of PRh during the first-order conditioning or stage 2. They show that putative blockade of synaptic changes in PRh, using an NMDA antagonist, disrupts responding to the preconditioned cue at test during shorter duration preconditioning training (8 trials), but not during extended training (32 trials). They also show that this is the case for communication between the PRh and BLA during the same stage of training using a contralateral inactivation approach. This confirms their previous findings in 2019 of connectivity between these regions for the short duration training, while they observe here for the first time that this is not the case for extended training. Finally, they show that with extended training, communication between BLA and the PRh is required at the final test of the preconditioned stimulus, but not for the short duration training.

      Strengths:

      The results are clear and extremely consistent across experiments within this paper as well as with earlier work. The experiments here are thorough, well-conceived, and address an important and highly debated question in the field regarding the neural and psychological mechanisms underlying sensory preconditioning. This work is highly impactful for the field as the debate over mediated versus chaining mechanisms has been an important topic for more than 70 years.

      Comments on revisions:

      Thank you for addressing all of my concerns in considerable detail. I have no more suggestions for the authors. This is a fantastic paper both in the experimental design and the execution as well as in the high quality of writing.

    4. Reviewer #3 (Public review):

      The authors tested whether: 1. The number of stimulus-stimulus pairings alters whether preconditioned fear depends on online integration during formation of the stimulus-outcome memory or during the probe test/mobilization phase, when the original stimulus, which was never paired with aversive events, elicits fear via chaining of stimulus-stimulus and stimulus-outcome memories. They found that sensory preconditioning was successful with either 8 or 32 stimulus-stimulus pairings. Perirhinal cortex NMDA receptor blockade during stimulus-outcome learning impaired preconditioning following 8 but not 32 pairings during preconditioning. Therefore, perirhinal cortex NMDA activity is required for online integration or mediated learning. Perirhinal-basolateral amygdala had nearly identical effects with the same interpretation: these areas communicate during stimulus-outcome learning, and this online communication is required for later expressing preconditioned fear. Disconnection prior to the probe test, when chaining might occur, had different effects: it impaired the expression of preconditioned fear in rats that received 32, but not 8, pairings during preconditioning. The study has several strengths and provides a thoughtful discussion of future experiments. The study is highly impactful and significant; the authors were successful in describing the behavioral and neurobiological mechanisms of mediated learning versus chaining in sensory preconditioning, which is often debated in the learning field. Therefore this study will have a significant impact on the behavioral neurobiology and learning fields.

      Strengths:

      Careful, rigorous experimental design and statistics

      The discussion leaves open questions that are very much worth exploring. For example - why did perirhinal-amygdala disconnection prior to the probe have no effect in the 8-pairing group, when bilateral perirhinal inactivation did (in Wong et al, 2019)? The authors propose that perirhinal cortex outputs bypass the amygdala during the probe test, which is an excellent hypothesis to test.

      The experiments are very explicitly hypothesis-driven, and the authors provide evidence of how and why mediated learning and chaining occur during sensory-sensory learning.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study is an important follow-up to their prior work - Wong et al. (2019), starting with clear questions and hypotheses, followed by a series of thoughtful and organized experiments. The method and results are convincing. Experiment 1 demonstrated the sensory preconditioned fear with few (8) or many (32) sound-light pairings. Experiments 2A and 2B showed the role of PRh NMDA receptors during conditioning for online integration, revealing that this contribution is present only after a few sound-light pairings, not after many sound-light pairings. Experiments 3A and 3B showed the contribution of PRh-BLA communication to online integration, again only after a few but not after many. Contrary to Experiments 3A and 3B, Experiments 4A and 4B showed the contribution of PRh-BLA communication to integration at test only after many but not few sound-light pairings.

      Strengths:

      Throughout the manuscript, the methods and results are clearly organized and described, and the use of statistics is solid, all contributing to the overall clarity of the research. The discussion section was also well-written, effectively comparing the current research with the prior work and offering insightful interpretations and potential future directions for this line of research. I have only a limited amount of concerns about some results and some details of experiments/statistics.

      We thank the reviewer for their positive assessment.

      Weaknesses:

      Could you provide further interpretation regarding line 171: the observation that sensory preconditioned fear increased with the number of sound-light pairings? Was this increase due to better sound-light association learning during Stage 1? Additionally, were there any experimental differences between Experiment 1 and the other experiments that might explain why freezing was higher in the P32 group compared to the P8 group? This pattern seemed to be absent in the other experiments. If we consider the hypothesis that the online integration mechanism is more active with fewer pairings and the chaining mechanism at the test is more prominent with many pairings, we wouldn't expect a difference between the P8 and P32 groups. Given the relatively small sample size in Experiment 1, the authors might consider conducting a cross-experiment analysis or something similar to investigate this further.

      We appreciate the reviewer’s point and thank them for the question. The heightened level of sensory preconditioned fear among rats that received many sound-light pairings in the initial control experiment (Group P32) may reflect the combined effects of both mediated learning and chaining at test. We are, however, reluctant to offer a strong interpretation of this result as it was not replicated in the subsequent experiments: i.e., the levels of freezing to the sensory preconditioned stimulus at test were almost identical among vehicle-injected controls that received either few (8) or many (32) sound-light pairings in Experiments 2A and 2B; and this was also true in Experiments 3A and 3B, and again in Experiments 4A and 4B. A key difference between the initial and subsequent experiments is that, in contrast to the initial experiment, rats in subsequent experiments underwent surgery for one reason or another (implantation of cannulas, lesion of the perirhinal cortex). The implication is that surgical interventions in the perirhinal cortex and/or basolateral amygdala might affect the way that rats integrate the sound-light and light-shock associations in sensory preconditioning: i.e., they may force rats to rely on one type of integration strategy or the other. This is, of course, purely speculative – it will be addressed in future research.

      Reviewer #2 (Public review):

      This manuscript builds on the authors' earlier work, most recently Wong et al. 2019, in which they showed the importance of the perirhinal cortex (PRh) during the first-order conditioning stage of sensory preconditioning. Sensory preconditioning requires learning between two neutral stimuli (S2-S1) and subsequent development of a conditioned response to one of the neutral stimuli after pairing of the other stimulus with a motivationally relevant unconditioned stimulus (S1-US). One highly debated question regarding the mechanisms of learning of sensory preconditioning has been whether conditioned responses evoked by the indirectly trained stimulus (S2) occur through a mediated representation at the time of the first-order US training, or whether the conditioned responses develop through a chained evoked representation (S2--> S1 --> US) at the time of test. The authors' prior findings provided strong evidence for PRh being involved in mediated learning during the first-order training. They showed that protein synthesis was required during the first-order S1-US learning to support the conditioned response to the indirectly trained stimulus (S2) at the test.

      One question remaining following the previous paper was whether certain conditions may promote a chaining mechanism over mediated learning, as there is some evidence for chained representations at the time of the test. In this paper, the authors directly address this important question and find unambiguous results that the extent of training during the preconditioning stage impacts the involvement of PRh during the first-order conditioning or stage 2. They show that putative blockade of synaptic changes in PRh, using an NMDA antagonist, disrupts responding to the preconditioned cue at test during shorter duration preconditioning training (8 trials), but not during extended training (32 trials). They also show that this is the case for communication between the PRh and BLA during the same stage of training using a contralateral inactivation approach. This confirms their previous findings in 2019 of connectivity between these regions for the short-duration training, while they observe here for the first time that this is not the case for extended training. Finally, they show that with extended training, communication between BLA and the PRh is required at the final test of the preconditioned stimulus, but not for the short duration training.

      The results are clear and extremely consistent across experiments within this paper as well as with earlier work. The experiments here are thorough, and well-conceived, and address an important and highly debated question in the field regarding the neural and psychological mechanisms underlying sensory preconditioning. This work is highly impactful for the field as the debate over mediated versus chaining mechanisms has been an important topic for more than 70 years.

      We thank the reviewer for their kind assessment.

      Reviewer #3 (Public review):

      The authors tested whether the number of stimulus-stimulus pairings alters whether preconditioned fear depends on online integration during the formation of the stimulus-outcome memory or during the probe test/mobilization phase, when the original stimulus, which was never paired with aversive events, elicits fear via chaining of stimulus-stimulus and stimulus-outcome memories. They found that sensory preconditioning was successful with either 8 or 32 stimulus-stimulus pairings. Perirhinal cortex NMDA receptor blockade during stimulus-outcome learning impaired preconditioning following 8 but not 32 pairings during preconditioning. Therefore, perirhinal cortex NMDA activity is required for online integration or mediated learning. Perirhinal-basolateral amygdala had nearly identical effects with the same interpretation: these areas communicate during stimulus-outcome learning, and this online communication is required for later expressing preconditioned fear. Disconnection prior to the probe test, when chaining might occur, had different effects: it impaired the expression of preconditioned fear in rats that received 32, but not 8, pairings during preconditioning. The study has several strengths and provides a thoughtful discussion of future experiments. The study is highly impactful and significant; the authors were successful in describing the behavioral and neurobiological mechanisms of mediated learning versus chaining in sensory preconditioning, which is often debated in the learning field. Therefore this study will have a significant impact on the behavioral neurobiology and learning fields.

      Strengths:

      Careful, rigorous experimental design and statistics.

      The discussion leaves open questions that are very much worth exploring. For example - why did perirhinal-amygdala disconnection prior to the probe have no effect in the 8-pairing group, when bilateral perirhinal inactivation did (in Wong et al, 2019)? The authors propose that perirhinal cortex outputs bypass the amygdala during the probe test, which is an excellent hypothesis to test.

      The authors provide evidence that both mediated learning and chaining occur.

      Thank you for the positive assessment – we fully intend to identify the circuitry that regulates retrieval/expression of sensory preconditioned fear when it is based on mediated learning in stage 2.

      Weaknesses:

      This is inherent to all neural interference and behavioral experiments: biological/psychological functions do not typically operate binarily. There is no single clear number or parameter at which mediated learning or chaining happens, and both probably happen to some extent. Addressing this is even more difficult given behavioral variability across subjects, implant sites, etc. Thus, this is not so much a weakness particular to this study as much as an existential problem, which the authors were able to work around with careful experimental design and appropriate controls.

      We completely agree with the point raised here and thank the reviewer for their assessment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It appears that the method description for Sensory Preconditioning was copied from their previous Wong et al. (2019) paper, which is fine, but in the current research, the authors use 8 or 32 presentations, which is not reflected in the description.

      Thank you for bringing this to our attention. This is now addressed in the method section on page 27 (beginning at line 655):

      “Rats received either eight presentations of the sound and eight of the light in a single session, or 32 presentations of the sound and 32 of the light across four daily sessions. On Day 3, all rats received eight presentations of the sound and eight of the light. Each presentation of the sound was 30 s in duration and each presentation of the light was 10 s in duration. The first stimulus presentation occurred five min after rats were placed into the chambers. The offset of one stimulus co-occurred with the onset of the other stimulus for groups that received paired presentations of the sound and the light, while these stimuli were presented separately for groups that received explicitly unpaired presentations. The interval between each paired presentation was five min while the interval between each separately presented stimulus was 150 s. After the last stimulus presentation, rats remained in the chambers for an additional one min. They were then returned to their home cages. This training was repeated on Days 4-6 for rats that received 32 presentations of the sound and 32 of the light. All rats proceeded to first-order conditioning (details below) the day after their final session of sound and light exposures, which was Day 4 for rats exposed to eight presentations of the sound and light and Day 7 for rats exposed to 32 presentations of the sound and light.”

      (2) Line 148: Could the authors clarify how the "significant linear increase" was assessed? From similar descriptions in later experiments, it seems it was based on a comparison of freezing across the four presentations, but the F(1,26) statistic suggests there seemed to be a half-split test. The same questions exist in all the experiments. Please clarify.

      Conditioning data were analysed using contrasts with repeated measures in ANOVA. The repeated measures (or within-subject) factor was “trial” as all rats were exposed to four light-shock pairings in this stage of training. We examined whether there was a significant linear increase in freezing across trials using a standard within-subject contrast. The specific coefficients for this contrast, given the four trials, were -3, -1, 1, and 3. The reason that the degrees of freedom remain 1 and 26 in this analysis is because the within-subject contrast is part of a set of planned orthogonal contrasts. That is, in any planned analysis of the sort conducted here, the df1 will always be 1, indicating the very nature of the analysis. There was no splitting of the data, or comparisons between the split halves.

      (3) Line 154: Could the authors clarify what is meant by "other main effects and their interactions"? It is not clearly inferable from the context.

      Apologies for the confusion here. “Other main effects” refer to the two between-subject factors in isolation: i.e., the overall comparison of freezing to the light (averaged across the four trials) between groups that received either paired or unpaired stimulus presentations in stage 1 (factor 1 à main effect 1), and between groups that received either eight or 32 sound and light exposures in stage 1 (factor 2 à main effect 2). “Their interaction” refers to the assessment of whether the overall difference in freezing to the light (averaged across the four trials) between Groups P8 and U8 differs from the overall difference in freezing to the light (averaged across the four trials) between Groups P32 and U32. We have edited the text near line 153 to indicate that:

      “The overall comparisons of freezing to the light (averaged across the four conditioning trials) between groups that received either paired or unpaired stimulus presentations in stage 1 (factor 1), and between groups that received either eight or 32 sound and light exposures in stage 1 (factor 2), were not significant (Fs < .45, p > .508). The interaction between these two between-subject factors was also not significant (F < .45, p > .508).”

      (4) The use of sound and light as preconditioned and conditioned cues are counterbalanced. Was there any difference in the increase of freezing during conditioning depending on the type of conditioned cues? Was there any difference in the preconditioned fear? While it is hard to assess statistical significance due to the sample size limit, even observing a trend could be interesting.

      We examined whether the levels of freezing to the conditioned and preconditioned stimuli depend on their physical identity. In general, there was a slight trend towards more freezing to the preconditioned stimulus when it was a tone, and less freezing to the conditioned stimulus when it was a tone. These are, however, simply indications. None of the statistical comparisons between rats for which the preconditioned stimulus was the tone (and, thereby, conditioned stimulus was the light) and rats for which the preconditioned stimulus was the light (and, thereby, conditioned stimulus was the tone) reached the conventional level of significance.

      (5) General suggestion on reporting non-significant statistics: the authors reported a small F statistic value a few times to suggest non-significance. But without clearly specifying degrees of freedom, it is hard to get a sense of statistical significance (e.g. Line 227, largest F<3.10). I recommend adding p values alongside the F statistics and reporting exact statistics whenever possible.

      Apologies for the omission. The p values have now been included alongside all non-significant F statistics.

      (6) Another general suggestion is to use non-parametric statistical testing with such small sample sizes. I recommend using the Kruskal-Wallis H test (the non-parametric equivalent of F-statistic) to replace the ANOVA result. Also, given many tests only involve comparing two independent groups, using Mann-Whitney U test (the non-parametric equivalent of independent t-test) would be sufficient.

      We understand that small sample sizes can occasionally lead to unequal variances between groups, which necessitates the use of non-parametric statistics. However, as non-parametric statistics raise a different set of issues for data analysis (e.g., power) and interpretation, our general view for the type of data collected in this study is that parametric analyses are appropriate and should be retained (particularly in the absence of unequal variances between groups). We hold this view for two reasons. First, the hypotheses tested in the present series were derived from past work in which parametric analyses revealed meaningful patterns of results at the same level of statistical power. Second, the application of these analyses then yielded results consistent with our hypotheses: for the most part, we observed between-group differences where we expected there to be such differences and did not observe between-group differences where we did not expect there to be such differences. As such, we have not switched from a parametric to non-parametric analysis strategy. We do, however, appreciate the suggestion and will apply a non-parametric approach where it is warranted in our future work.

      Reviewer #2 (Recommendations for the authors):

      I have a few very minor comments for the authors regarding the discussion and interpretation of the very nice experimental results.

      (1) In Figures 4 and 5, the authors provide a schematic of the experiment. It's very clearly indicated whether the BLA inactivation is ipsi- or contralateral, but the unilateral PRh lesion isn't mentioned. I'd recommend including that here so that someone reading through the figures can more easily understand the experiment. The hypothesis is clear and the experiment is so well designed that a read through of the figures can relay most information to an experienced reader.

      Thank you for this suggestion – we have included information about the unilateral PRh lesion in the schematic for Figures 4 and 5.

      (2) The authors have an extended description of backward conditioning in the discussion. It seems like the authors are suggesting this as an important future direction, but they never explicitly say this, resulting in a bit of confusion as to what this section refers to. Also, Ward-Robinson and Hall 1996 showed backward sensory preconditioning using a serial auditory-visual association and argued for a mediated solution based on their results. It may be worth citing that paper here.

      Apologies for the lack of clarity. We have revised this point in the discussion (page 18, beginning line 434) and referenced Ward-Robinson and Hall (1996):

      “Why does increasing the number of sound-light pairings change the way that rats integrate the sound-light and light-shock memories? One possibility is that increasing the number of sound-light pairings in stage 1 reduces the ability of each stimulus to activate the memory of the other. This is consistent with findings by Holland (1998), who showed that the likelihood of mediated learning in rats decreases with the amount of training (see also Holland, 2005); but inconsistent with our findings that, after extended training, rats continue to integrate the sound-light and light-shock associations through chaining at the time of testing (as chaining is predicated on the sound activating the memory of the light after extended training). Instead, we propose that the change in integration occurs because the increased number of sound-light pairings allows the rats to learn about the order in which the sound and light are presented (Figure 1; for evidence that rats acquire order information in sensory preconditioning, see Barnet et al., 1997; Hart et al., 2022; Leising et al., 2007; Miller & Barnet, 1993). This order hypothesis is consistent with evidence showing that the way in which animals represent an audio-visual compound changes across repeated compound exposures (e.g., Bellingham & Gillette, 1981; Holmes & Harris, 2009). It can be tested using a so-called “backward” sensory preconditioning protocol, which reverses the order of stimulus presentations in stage 1 (e.g., Ward-Robinson & Hall, 1996). That is, rather than rats being exposed to the “forward” sound-light pairings used here and by Wong et al. (2019), rats in a backward protocol are exposed to light-sound pairings. Increasing the number of light-sound pairings in this protocol should result in rats learning that the light is followed by the sound (light→sound) and that the sound is followed by nothing (sound→nothing). Hence, during the session of light-shock pairings in stage 2, the light should continue to activate the memory of the sound, resulting in formation of the mediated sound-shock association (e.g., Ward-Robinson & Hall, 1996). That is, if our order hypothesis is correct, increasing the number of light-sound pairings in the backward protocol should preserve the likelihood of mediated learning in stage 2 and, if anything, diminish the likelihood of chaining at test in stage 3 (as the sound is never followed by a light). Hence, PRh manipulations that fail to affect fear of the sound when administered after many sound-light pairings (e.g., infusion of DAP5) should disrupt that fear when administered after many light-sound pairings in the backward protocol. This will be assessed in future work.”

      (3) Line 467 in the discussion suggests that the results are surprising that PRh-BLA communication is not needed at test when learning putatively occurs through a mediated mechanism during first-order conditioning. I was a bit surprised by this comment since I was under the assumption that only BLA was required at this point after consolidation of the mediated learning. Holmes et al., 2013 showed that BLA is required for extinction to S2 after first-order conditioning. In that experiment they inactivated BLA during S2- presentations (typically considered the extinction test), and showed that reduction to S2 did not occur the subsequent day, indicating the memory was stored in BLA and may not necessarily require PRh-BLA communication.

      The result noted here was somewhat surprising as our past studies showed that silencing activity in the PRh prior to testing attenuates freezing to a sensory preconditioned stimulus (i.e., an S2). We took this to mean that the PRh is necessary for retrieval/expression of fear to S2 and supposed that this retrieval/expression would be achieved through communication between the PRh and BLA. However, the results of the PRh-BLA disconnection at test show that this communication is not required, leaving us to speculate that retrieval/expression of fear to S2 may be achieved through communication between the PRh and CeA.

      We have edited the opening of the relevant paragraph to clarify why the result noted here was surprising (page 20, beginning line 485):

      “While the PRh and BLA clearly communicate to support mediated learning about the sound, this communication is not required for retrieval/expression of the mediated sound-shock association at the time of testing. This result is somewhat surprising as activity in the PRh is needed for expression of fear to the sound (Holmes et al., 2013; Wong et al., 2019) and raises the question: how does the PRh-dependent sound-shock association come to be expressed in fear responses?”

      (4) The authors reference Holland 1981 and 1998, yet there's not much discussion of these findings. I think there should be a bit more emphasis on these studies since they show how mediated learning greatly depends on the extent of training. Also, it may be worth considering Holland's theory of why mediated conditioning is more effective with shorter training. His theory may be consistent with the authors, but I believe he suggests that early in training a stronger mediated representation is evoked which tends to dissipate with time. I think this is a valid hypothesis to consider in this paper.

      The Holland papers show that rats form mediated associations (Holland, 1981) and that the likelihood of them doing so decreases with the amount of training (Holland, 1998). These findings are paralleled by those reported in the present series of experiments. However, the protocols used by Holland were very different to those used in the present study; and the explanation for his 1998 findings (which is the more relevant of the two papers) simply does not apply to the case of sensory preconditioning.

      To be clear: Holland (1998) exposed rats to either “few” or “many” tone-food pairings in stage 1, tone-lithium chloride pairings in stage 2 and, finally, tested rats with the food alone in stage 3. He predicted and showed that those exposed to few tone-food pairings showed an aversion to the food at test (i.e., they consumed less of the food than controls) whereas those exposed to many tone-food pairings showed no such aversion (i.e., they consumed the same amount of food as the controls). This was taken to mean that, across the series of tone-lithium pairings, the tone activated the memory of food among rats in the few condition, resulting in a mediated food-lithium association; but failed to do so among rats in the many condition, resulting in no food-lithium association. According to Holland, the tone failed to activate the memory of food in the many condition because, by the end of training in stage 1, it was not needed for them to know what to do when the tone was presented: they simply had to run to the magazine to collect the food when delivered. That is, the tone eventually associated with the responses that rats emitted in the training situation, thereby obviating any need for activation of the food memory.

      While this explanation is both elegant and interesting, it cannot be applied to the results obtained in the present study where the initial stage of training involved few or many sound-light pairings. That is, unlike in the Holland study where rats in the many condition eventually learned a stimulus-“run to magazine” association that maintained performance in the absence of any mental image of food, in the present study, any stimulus-response association acquired in stage 1 (e.g., orienting responses towards the sources of the auditory and visual stimuli) cannot have contributed to the expression of sensory preconditioned fear at test. Hence, stimulus-response learning in the many condition cannot be invoked to explain the pattern of results in the present study, even if it adequately explains what-appears-to-be a similar finding in the Holland study.

      Nonetheless, we have included a reference to the general style of explanation that was considered and rejected by Holland in his 1998 and 2005 papers. This appears on page 18 (beginning line 434) and reads:

      “Why does increasing the number of sound-light pairings change the way that rats integrate the sound-light and light-shock memories? One possibility is that increasing the number of sound-light pairings in stage 1 reduces the ability of each stimulus to activate the memory of the other. This is consistent with findings by Holland (1998), who showed that the likelihood of mediated learning in rats decreases with the amount of training (see also Holland, 2005); but inconsistent with our findings that, after extended training, rats continue to integrate the sound-light and light-shock associations through chaining at the time of testing (as chaining is predicated on the sound activating the memory of the light after extended training). Instead, we propose that the change in integration occurs because the increased number of sound-light pairings allows the rats to learn about the order in which the sound and light are presented (Figure 1; for evidence that rats acquire order information in sensory preconditioning, see Barnet et al., 1997; Hart et al., 2022; Leising et al., 2007; Miller & Barnet, 1993)…”

      (5) There is also a Holland 2005 paper in which he tests whether extended training of the initial stimulus associations may result in a reduced associability of those stimuli. This would potentially result in lower mediated learning due to a decreased associability of the mediated representation, thereby explaining why extended training reductions in mediated learning occur. Using a probabilistic design, Holland shows that this reduction in mediated learning is likely not due to a change in associability.

      We appreciate the note re Holland (2005) and have included a reference to it in our General Discussion. We agree with Holland that the reduction in mediated learning across extended training is not due to reduced associability of the retrieved stimulus representation. If this were the case, it would remain to explain why stimulus representations continue to be activated at test, which must occur for successful chaining of the sound-light and light-shock associations upon presentations of the sound alone. This is included in the modified text on page 18 (beginning line 434), which is part of our response to point 4.

      Reviewer #3 (Recommendations for the authors):

      (1) I think the 4th intro paragraph is essentially saying that more pairings during preconditioning encourage chaining as opposed to mediated learning - I might recommend clarifying this a bit. It took me a while to put it together.

      Apologies for the confusion. We have clarified the argument at this point in the Introduction with the following insertion on page 4 (beginning line 84):

      “That is, increasing the number of sound-light pairings may allow rats to encode information about stimulus order in stage 1 and, thereby, shift the locus of integration from mediated conditioning in stage 2 to chaining at test in stage 3 (Holmes et al., 2022).”

      (2) In analyzing test data I am assuming percent freezing is the average of the entire 30s or 10s CS period - could this be clarified?

      This is correct and has been clarified in the section for ‘Scoring and Statistics’ on page 29 (beginning line 708):

      “Freezing data were collected using a time-sampling procedure in which each rat was scored as either ‘freezing’ or ‘not freezing’ every two seconds by an observer blind to the rat’s group allocation. A percentage score was then calculated by dividing the number of samples scored as freezing by the total number of samples. The baseline level of freezing was established by scoring the first two min at the start of each experimental session: i.e., we divided the total number of samples scored as freezing by the total number of observed samples, which was 60. The levels of freezing to the 10 s conditioned stimulus and 30 s preconditioned stimulus were established in a similar manner: we scored the entire period of each stimulus presentation and divided the number of samples scored as freezing by the total number of observed samples, which was 5 for each presentation of the conditioned stimulus and 15 for each presentation of the preconditioned stimulus.”

      (3) Complementary to the above - during the probe test is there a difference during the first/last 2s of the CS? This would be interesting with respect to understanding the associative structure encoded.

      We have previously examined whether freezing responses change across the duration of a 30 s preconditioned stimulus and a 10 s conditioned stimulus. We have never seen any such changes: in our past work and in the present series of experiments, the expression of freezing is largely uniform across each presentation of a preconditioned or conditioned stimulus.

      (4) It is sort of unclear to me why more CS-CS pairings produced stronger preconditioned fear - is it that both mediated learning and chaining occur and giving 32 pairings permits both processes more than 8 pairings?

      This is a very reasonable explanation for the heightened level of sensory preconditioned fear among rats that received many sound-light pairings in the initial control experiment. We are, however, reluctant to offer a strong interpretation of this result as it was not replicated across subsequent experiments in the series: i.e., the levels of freezing to the sensory preconditioned stimulus at test were largely the same among vehicle-injected controls that received either few (8) or many (32) sound-light pairings in Experiments 2A and 2B, and again in Experiments 3A and 3B as well as Experiments 4A and 4B.

      (5) I would suggest individual data points overlaid on the bars, violin plots, or box and whisker plots to provide a better visualization of the data.

      We appreciate the suggestion – these have been included overlaid on bars in each histogram_._

      (6) There are other citations that would strengthen arguments for the idea that unidirectional/temporal associative structure can be acquired during (appetitive) sensory preconditioning: Leising 2007 Learning and Behavior, Hart 2022 Current Biology, for example.

      Thank you for these citations. We have included references to the Leising et al (2007) and Hart et al (2022) papers in our discussion on page 18-19 (beginning line 442):

      “Instead, we propose that the change in integration occurs because the increased number of sound-light pairings allows the rats to learn about the order in which the sound and light are presented (Figure 1; for evidence that rats acquire order information in sensory preconditioning, see Barnet et al., 1997; Hart et al., 2022; Leising et al., 2007; Miller & Barnet, 1993)…”

      Editor's note:

      We agree with the suggestions about full statistical reporting for non-significant results and about putting individual data points, perhaps coded to identify sex, on top of the bar graphs. Both will increase the transparency of the rigor of the work for readers.

      We thank the editors and authors for their suggestions. We have included full statistical reporting for non-significant results and overlaid individual data points on the bars in each histogram.

    1. eLife Assessment

      This important work investigates the mechanism that underlies the switch between feeding and mating behaviors in the oriental fruit fly, Bactrocera dorsalis. Using a variety of approaches, the authors show that this switch is mediated by the neuropeptide, sulfakinin, acting peripherally through the sulfakinin receptor 1 to regulate the expression of antennal odorant receptors. The evidence is solid in support of the hypothesis that sulfakinin signaling mediates changes in the periphery, although additional sites of action may also contribute to these changes.

    2. Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study characterizes the role of sulfakinin and the sulfakinin receptor 1 in changes in olfactory responses associated with foraging versus mating behavior in the oriental fruit fly (Bactrocera dorsalis), a significant agricultural pest. This pathway regulates food consumption and mating receptivity in other species; here the authors use genetic disruption of sulfakinin and sulfakinin receptor 1 to provide strong evidence that changes in sulfakinin signaling modulate antennal responses to food versus pheromonal cues and alter the expression of ORs that detect relevant stimuli.

      Strengths:

      The authors utilize multiple complementary approaches including CRISPR/Cas9 mutagenesis, behavioral characterization, electroantennograms, RNA sequencing and heterologous expression to convincingly demonstrate the involvement of the sulfakinin pathway in the switch between foraging and mating behaviors. The use of both sulfakinin peptide and receptor mutants is a strength of the study and implicates specific signaling actors.

      Weaknesses:

      The authors demonstrate that SKR is expressed in olfactory neurons, however there are additional potential sites of action that may contribute to these results.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study investigated the role of the neuropeptide, sulfakinin, and of its receptor, the sulfakinin receptor 1 (SkR1), in mediating this switch in the oriental fruit fly, Bactrocera dorsalis. The authors use genetic disruption of sulfakinin and of SkR1 to provide strong evidence that changes in sulfakinin signaling alter odorant receptor expression profiles and antennal responses and that these changes mediate the behavioral switch. The combination of molecular and physiological data is a strength of the study. Additional work would be needed to determine whether the physiological and molecular changes observed account for the behavioral changes observed.

      Strengths:

      (1) The authors show that sulfakinin signaling in the olfactory organ mediates the switch between foraging and mating, thereby providing evidence that peripheral sensory inputs contribute to this important change in behavior.

      (2) The authors' development of an assay to investigate the behavioral switch and their use of different approaches to demonstrate the role of sulfakinin and SkR1 in this process provides strong support for their hypothesis.

      (3) The manuscript is overall well-organized and documented.

      Weaknesses:

      (1) The authors claim that sulfakinin acts directly on SkR1-positive neurons to modulate the foraging and mating behaviors in B. dorsalis. The authors also indicated in the schematic that satiation suppresses SkR1 expression. Additional experiments and more a detailed discussion of the results would help support these claims.

      (2) The findings reported could be strengthened with additional experimental details regarding time of day versus duration of starvation effects and additional genetic controls, amongst others.

      Recommendations for the authors:

      Major issues

      (1) As written the introduction is somewhat fragmented and does not lay out a clear rationale for the current study in the species used by the authors. Others, including Guo et al. (2021) and Wang et al. (2022), have previously shown that sulfakinin signaling pathways are important for feeding and receptivity regulation in D. melanogaster. Thus, the novelty of this study should be more clearly articulated.

      The introduction in the revision is significantly changed to improve the description for the rationale of study (lines 60-66 in the revision).

      (2) In addition, the Introduction should provide more specific background information on the pheromonal activity of oriental fruit fly body extract, the odor-preferences, and the sex pheromone of this species compared to that of model insects such as Drosophila melanogaster.

      The revision contains a paragraph of introduction for chemical ecology of oriental fruit fly that is related to this study (lines 67-75).

      (3) It isn't clear what the first image in Figure 1C represents - is this a schematic of the area or does it represent data?

      The Fig 1C and the associated figure caption are revised. The figure is more visible by changing the track colors. The figure caption is revised as “Representative foraging trajectories in the 100 mm diameter arenas within a 15-min observation period of flies starved for different durations.”

      (4) The authors should include examples of the EAG recordings following the stimulation with food volatiles or pheromones, not only the results of their analyses. This could be included in the main figures or even in supporting information.

      As suggested, we added the examples of the EAG recordings following the stimulation with food odors and body extracts in the Figure 1 and Figure 3.

      (5) The demonstration that removal of the antennae severely impairs mating is dispensable because the antennae are required for other functions in addition to olfaction.

      We agree that the roles of the antennae are likely more than the olfactory function. As suggested, we removed the data.

      (6) It is currently difficult to understand how the authors measured successful rates of foraging. Please provide more details.

      In the revision, we added a sentence describing the method for measuring in detail. See line 269-273.

      (7) The expression of sulfakinin does not change significantly in the antennae following starvation (Figure 2A). Do the authors know whether they change in the central nervous system under these conditions? Have the authors (or has anyone else) checked the expression pattern of sulfakinin in the antennae? This information would help determine whether the sulfakinin signal that acts on SkR1 is released from neurons in the central nervous system (Figure S4C) or whether it is also released from the neurons in the olfactory organs. Based on the immunochemistry results shown in Figure S4C, it would also be interesting to determine whether the intensity of anti-sulfakinin immunoreactivity changes before versus after starvation. This could help establish whether sulfakinin is released during starvation.

      We added the expression data showing the the mRNA level of Sk in the head that is higher after refeeding in Fig. S3. The change in the expression of Sk is also added in the text (lines 107-110). We were unable to identify the Sk neurons in the antennae suggesting possibility of the direct action of humoral Sk on the antennae.

      (8) In Figure 2A, the authors show that the expression levels of some neuropeptides system components change during starvation. However, it would be helpful if the authors could include more detailed information on how the results are shown in the figure legends (e.g., the expression level of each candidate in fed flies was set as 1, etc).

      We revised the figure caption to explain the Figure 2 with the expression values in the figure legend.

      (9) In Figure 2D, null mutant males of sulfakinin and SkR1 consume more food at all times compared to the wild type. However, the corresponding mutant females consume more food only at night. Is this because the wild-type female flies eat more food during the day? In a related issue, Figure 2D shows differences in food consumption measured at different times of day, however, this is not directly addressed in the text, which instead mentions that "the amount of excess food consumed by the mutants was dependent on the duration of the starvation period in both sexes".

      Thank you for the important suggestions. We speculate that the difference of feeding amounts of females occurring only at night is due to the high basal feeding rate of females during the daytime, masking the increase in feeding in the knockout of Sk signaling. As suggested, we have added a relevant description of the difference in food consumption. In addition, we changed the Y-axis scale in the figure for a justified comparison between males and females. See line 123-128.

      (10) It isn't clear how the time of day relates to the duration of starvation. This suggests that mutant females only consume more at 21:00 (presumably at night) whereas males consume more throughout the day. Does this suggest an interaction with the circadian system? What is the duration of starvation in Figure 3A? In a related issue, in Figure 4 it would be useful to know what time of day the EAG analysis was done because the data shown in Figure 2D suggests that the time of day significantly impacts behavioral responses. And does the red versus blue color scheme of the OR subunits represent up/downregulated levels in wild-type animals? Please define this for the reader.

      In addition to the response to the point 9, responding to the issue of feeding amount in females. As the reviewer noted, there was indeed a diurnal difference in food amount consumed by B. dorsalis. However, whether this is related to circadian rhythms is something we haven't studied for further in-depth. Measuring food intake at these 3 times of day, we all ensured that the duration of starvation was the same 12 h. The duration of starvation in Figure 3A is 12h. We have mentioned this in the manuscript. See line 267-268.

      The EAG for sex pheromones and body surface extracts were measured form 21:00-23:00, and food odor was measured from 9:00-11:00. The times of the experiments are described in the revision. See line 309-311.

      Accordingly, we made a revision of the figure caption for explaining the colored fonts. Red color represents a set of ORs related with foraging and blue color is for a set of ORs related with mating. Therefore, the ORs with red color were upregulated in starved wild-type animals and the ORs with blue color were downregulated in starved wild-type flies. We have defined this in the revised manuscript. See line 672-673.

      (11) The authors convincingly show that SKR1 is present in the antennae and is co-expressed with orco. It would be useful to discuss whether this receptor is also expressed in other tissues where there may be additional sites of action of this pathway.

      Indeed, SkR1 is also expressed in the Drosophila brain. We added the discussion on the expression and additional sites of action of SKR1 within the central nervous system. See line 200-205.

      (12) It isn't clear what the dotted arrows in the model shown in Figure 5 represent.

      Dashed arrows represent the additional possible pathways that have not been tested in this study, but not excluded in the model. Please see the discussion for details of additional possible factors modulating odorant sensitivity relevant to satiety. See line 210-229.

      (13) In Figure 5, the authors indicate that satiation suppresses SkR1 expression. It would be helpful if the authors tested the expression level of SkR1 in re-fed flies (by feeding the flies after 12h starvation) to see whether levels of expression are rapidly restored to the levels seen in satiated animals. Such a result could further support the claims made by the authors.

      Thank for your suggestions. Indeed, refeeding after 12h starvation significantly decreased SkR1. We added the result in supporting information (Fig. S3). See line 713. Results see line 107-110.

      (14) The authors show that locomotor activity is unaffected in the mutants but body size comparison would be more useful here since this could also contribute to baseline differences in meal size.

      In the revision, we provided a comparison between WT and Sk-/- in the supplementary data. Results showed that mutant flies have the same body size as the WT flies. (Fig. S7) See line 742. Results see line 120-121.

      (15) Have the authors tested the behavioral phenotypes of heterozygotes mutant of both Sk and SkR1 flies? This may reveal whether a reduced expression of Sk-SkR1 will also cause significant changes in the foraging and mating behaviors seen during starvation.

      We tested the behavioral phenotypes of heterozygous mutant of Sk knockout flies. The results showed that foraging and mating behaviors of Sk heterozygous mutants were unaffected during starvation, suggesting the mutants are completely recessive. We have added the results in supporting information (Fig. S8). See line 746. Results see line 132-135.

      (16) It would be useful to provide information about which SK peptide is detected by the antibody used in Figure S4C. In Figures S4C and S5D, it would be useful to include a counterstain to show that the general morphology is unaffected in the mutants.

      As suggested, we added a detailed description for rabbit anti-BdSk antibody. See line 362-363. We have improved the background image to be available to show the general structure. So counter staining would not be essential.

      (17) The figure legends for supporting figures need to be improved as they are currently difficult to understand. For example, in S2: what is the meaning of "different removal of antennae"? In S3: it isn't clear how the authors evaluated the responses in EAG experiments; in S4A: there are several DNA sequences that do not appear in the main text of the manuscript; in S4C: the meaning of the boxes and the dots is unclear, as is the figure to the left; in S5D, the authors explain only the suppression of SKR1, yet the figure indicates some images for SKR IHC. These are only a few examples; we ask that the authors revise and improve the legends for supporting figures.

      For S2, we removed the data as suggested. For S3, we added a sentence describing the method for measuring in detail. See line 707-709. For S4, the figure in the revision is significantly changed and added a detailed description in the legend (lines 717-724 in the revision). For S5, we have improved our description. See line 731-734. In addition, we have checked all the figure legends of our manuscript and changes were displayed in track version.

      Minor issues

      (1) It isn't clear what the meaning of "the complexity of sulfakinin pathways" is. Please explain.

      We have rewritten the sentence in the revised manuscript by adding the description as “…complexity of Sk pathways, special and temporal dynamics and multiple ligands and receptors, is…”. See line 61-65.

      (2) Please double-check the calls to the various figures in the text.

      We have double-checked the calls to all the figures in the text to make sure they were correct.

      (3) L125: What is the meaning of "olfactory reprogramming"? Please explain.

      We rephrased it to “alteration of olfactory sensitivities”. See line 145.

      (4) L135: After mentioning qRT-PCR the authors should include a call to a figure that shows these results.

      Thank you for your suggestion, the qRT-PCR results are shown in Figure 4B, and we have added it as suggested. See line 154.

      (5) L270: Details are provided for the extraction of the pheromone. However, more details are needed on how the EAG and other functional assays were done.

      We have described the assay procedures in detail in the materials and method part. See line 298-311.

      (6) Figure 2B. Please remove the period(".") at the C-terminal end of WT sk.

      We are sorry for our mistake. We have corrected it.

    1. eLife Assessment

      This study addresses an important and longstanding question regarding the molecular mechanism of protein misfolding in Ig light chain (LC) amyloidosis (AL), a life-threatening condition. By combining advanced techniques, including small-angle X-ray scattering, molecular dynamics simulations, and hydrogen-deuterium exchange mass spectrometry, the authors provide convincing evidence that the "H state" distinguishes amyloidogenic from non-amyloidogenic LCs. These findings not only offer novel insights into LC structural dynamics but also hold promise for guiding therapeutic strategies in amyloidosis and will be of particular interest to structural biologists, biophysicists, and many others working on amyloid diseases.

    2. Reviewer #1 (Public review):

      The study investigates light chains (LCs) using three distinct approaches, with a focus on identifying a conformational fingerprint to differentiate amyloidogenic light chains from multiple myeloma light chains. The study's major contribution is the identification of a low-populated "H state," which the authors propose as a unique marker for AL-LCs. While this finding is promising, the review highlights several strengths and weaknesses. Strengths include the valuable contribution of identifying the H state and the use of multiple approaches, which provide a comprehensive understanding of LC structural dynamics. Weaknesses include a lack of physical insights explaining the changes.

    3. Reviewer #2 (Public review):

      Summary:

      This well-written manuscript addresses an important but recalcitrant problem - molecular mechanism of protein misfolding in Ig light chain (LC) amyloidosis (AL), a major life-threatening form of systemic human amyloidosis. The authors use expertly recorded and analyzed small-angle X-ray scattering (SAXS) data as a restraint for molecular dynamics simulations (called M&M). Six patient-based LC proteins are explored, including four AL and two non-AL. The authors report a partially populated "H-state" determined computationally, wherein the two domains in an LC molecule acquire a straight rather than bent conformation, with an extended interdomain linker; this H-state distinguishes AL from non-AL LCs. H-D exchange mass spectrometry is used to support this conclusion. This is a novel and interesting finding with potentially important translational implications.

      Strengths:

      Expertly recorded and analyzed SAXS data combined with clever M&M simulations lead to a novel and interesting conclusion, which is supported by limited H-D exchange data.<br /> Stabilization of the CL-CL interface is a good idea that may help protect a subset of AL LCs from misfolding in amyloid.

      Computational M&M evidence is convincing and is supported by SAXS data, which are used as restraints for simulations. Although Kratky plots reported in the main MS Fig. 1 show significant differences between the data and the structural model for only one AL protein, AL-55, H-state is also inferred for other AL proteins.

      Apparent limitations:

      HDX MS results show that residues 35-50 from VL-VL and VL-CL dimerization interface are less protected in AL vs. non-AL proteins, which is consistent with the H-state. However, the small number of proteins yielding useful HDX data (three AL and one non-AL) suggests that this conclusion should be treated with caution. It is unclear whether the conformational heterogeneity depicted in M&M simulations is consistent with HDX results, and whether prior HDX studies of AL and MM LCs are consistent with the conclusions that a particular domain-domain interface is weakened in AL vs. non-AL LCs. The butterfly plots in Fig. 5 could benefit from the X-axis labeling with the peptide fragments.

    4. Reviewer #3 (Public review):

      Summary:

      This study identifies confirmational fingerprints of amylodogenic light chains, that set them apart from the non-amylodogenic ones.

      Strengths:

      The research employs a comprehensive combination of structural and dynamic analysis techniques, providing evidence that conformational dynamics at VL-CL interface and structural expansion are distinguished features of amylodogenic LCs.

      Weaknesses:

      The sample size is limited, which may affect the generalizability of the findings. Additionally, the study could benefit from deeper analysis of specific mutations driving this unique conformation to further strengthen therapeutic relevance.

      Furthermore. p-value (statistical significance) of Rg difference should be computer. Finally, significance of mutations (SHM?) at the interface, such as A40G should be compared with previous observations. (Garofalo et al., 2021)

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study identifies the "H-state" as a potential conformational marker distinguishing amyloidogenic from non-amyloidogenic light chains, addressing a critical problem in protein misfolding and amyloidosis. By combining advanced techniques such as small-angle X-ray scattering, molecular dynamics simulations, and H-D exchange mass spectrometry, the authors provide convincing evidence for their novel findings. However, incomplete experimental descriptions, limitations in SAXS data interpretation, and the way HDX MS data is presented aHect the strength and generalizability of the conclusions. Strengthening these aspects would enhance the impact of this work for researchers in amyloidosis and protein misfolding.

      We thank eLife editors and reviewers for their constructive feedback. The manuscript has been improved to provide a more complete description of the experiments and to strengthen the interpretation and presentation of all data. Updated Figures (Figure 2 and Figure 5) and a new Table (Table 2) in the main text provide a more complete and clearer comparison of the SAXS data with MD simulations as well as a clearer representation of the HDX MS data. Additional figures have been added in SI. The text has been extended accordingly and complete materials and methods are now included in the main text. Abstract, introduction and discussion have been revised to improve the overall readability of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The study investigates light chains (LCs) using three distinct approaches, with a focus on identifying a conformational fingerprint to diHerentiate amyloidogenic light chains from multiple myeloma light chains. The study's major contribution is identifying a low-populated "H state," which the authors propose as a unique marker for AL-LCs. While this finding is promising, the review highlights several strengths and weaknesses. Strengths include the valuable contribution of identifying the H state and using multiple approaches, which provide a comprehensive understanding of LC structural dynamics. However, the study suHers from weaknesses, particularly in interpreting SAXS data, lack of clarity in presentation, and methodological inconsistencies. Critical concerns include high error margins between SAXS profiles and MD fits, unclear validation of oligomeric species in SAXS measurements, and insuHicient quantitative cross-validation between experimental (HDX) and computational data (MD). This reviewer calls for major revisions including clearer definitions, improved methodology, and additional validation, to strengthen the conclusions.

      We thank the reviewer for the supportive comments, in the revised version of the manuscript we have focused on improving the clarity and completeness of our work. We are sorry for example to not have made previously clear enough that the comparison of SAXS with MD simulation was not that shown in the main text in Figure 1 and Table 1 (this is the comparison with single structures) but that reported in the SI (previously Figure S1 and Table S2, showing very good fits). These data have been moved in the main text in the reworked Figure 2 and new Table 2. We have also improved the presentation of the HDX MS data in Figure 5 and in the text adding also additional analysis in SI. Materials and methods are now completely moved in the main text. We generally revised the manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      This well-written manuscript addresses an important but recalcitrant problem - the molecular mechanism of protein misfolding in Ig light chain (LC) amyloidosis (AL), a major life-threatening form of systemic human amyloidosis. The authors use expertly recorded and analyzed smallangle X-ray scattering (SAXS) data as a restraint for molecular dynamics simulations (called M&M) and to explore six patient-based LC proteins. The authors report that a highly populated "H-state" determined computationally, wherein the two domains in an LC molecule acquire a straight rather than bent conformation, is what distinguishes AL from non-AL LCs. They then use H-D exchange mass spectrometry to verify this conclusion. If confirmed, this is a novel and interesting finding with potentially important translational implications.

      We thank the reviewer for the supportive comments.

      Strengths:

      Expertly recorded and analyzed SAXS data combined with clever M&M simulations lead to a novel and interesting conclusion. Regardless of whether or not the CL-CL domain interface is destabilized in AL LCs explored in this (Figure 6) and other studies, stabilization of this interface is an excellent idea that may help protect at least a subset of AL LCs from misfolding in amyloid. This idea increases the potential impact of this interesting study.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The HDX analysis could be strengthened.

      We have extended the analysis and improved the presentation of the HDX data. Figure 5 has been reworked, text has been improved accordingly and additional analysis have been reported in SI.

      Reviewer #3 (Public review):

      Summary:

      This study identifies conformational fingerprints of amyloidogenic light chains, that set them apart from the non-amyloidogenic ones.

      We thank the reviewer for the supportive comments.

      Strengths:

      The research employs a comprehensive combination of structural and dynamic analysis techniques, providing evidence that conformational dynamics at the VL-CL interface and structural expansion are distinguished features of amyloidogenic LCs.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The sample size is limited, which may aHect the generalizability of the findings. Additionally, the study could benefit from deeper analysis of specific mutations driving this unique conformation to further strengthen therapeutic relevance.

      We agree, we tried to maximise the size of the sample and this was the best we could do. With respect to the analysis of the mutations, while we tried to discuss some of them also in view of previous works, because our set covers multiple germlines instead than focusing on a single one, this limit our ability to discuss single point mutations systematically, at the same time the discussion of single points mutations has been the focus of many recent works, while our approach provide a diNerent point of view.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This study provides an investigation of light chains (LCs) using three distinct approaches, focusing primarily on identifying a conformational fingerprint to distinguish amyloidogenic light chains (AL-LCs) from multiple myeloma light chains (MM-LCs). The authors propose that the presence of a low-populated "H state," characterized by an extended quaternary structure and a perturbed CL-CL interface, is unique to AL-LCs. This finding is validated through hydrogendeuterium exchange mass spectrometry (HDX-MS). The study makes a valuable contribution to understanding the structural dynamics of light chains, particularly with the identification of the H state in AL-LCs. However, significant concerns regarding the interpretation of the SAXS data, clarity in presentation, and methodological rigor must be addressed. I recommend major revisions and resubmission of the work.

      Major concerns:

      (1) A critical concern is how the authors ensure that the SAXS profiles represent only dimeric species, given the high propensity of LCs to aggregate. If higher-order aggregates or monomers were present, this would significantly impact the SAXS data and SAXS-MD integration. Some measurements are bulk SAXS, while others are SEC-SAXS, making the study questionable. The authors need to clarify how only dimeric species were measured for the SEC-SAXS analysis, and all assessments of the dimeric state should be shown in the SI. Additionally, complementary techniques such as DLS or SEC-MALS should be used to verify the oligomeric state of the samples. Without this validation, the SAXS profiles may not be reliable.

      We added SEC-MALS and SEC-SAXS data in the SI (Figures S20 and S21) as well the SAXS curves shown in log-log plot (Figure S1) that display a flat trend at low q that exclude aggregation. SAXS is very sensitive to oligomers and aggregates and our data do not indicate the presence of those species. When we had indication of possible aggregation in the sample we used SEC-SAXS.

      (2) A major problem with the paper is that the claim of the "H state," which is the novelty of the study and serves as a marker of aggregation, is derived from samples where the error between the SAXS profiles and MD fits is extremely high. This casts doubt on whether the structure is indeed resolved by MD. The main conclusion of the paper is derived from weak consistency between experiment and simulation. In AL55, the error between experiment and simulation is greater than 5; for H7, it is higher than 2.8. The residuals show significant error at mid-q values, suggesting that long-range distance correlations (20-10 Å, CL, VL positioning) are not consistent between simulation and experiment. Furthermore, the FES plots of two independent replicas show deviation in the existence of the H state. One shows a minimum in that region, while the other does not. So, how robust is this conclusion? What is the chi-squared value if each replica is used independently? A separate experimental cross-validation is necessary to claim the existence of the H state.

      We apologise for the misunderstanding underlying this reviewer comment. The poor agreement mentioned is not between the SAXS and MD simulations, but with the individual structures, and this disagreement led us to perform MD simulations that are in much better agreement with the data (previously Fig. S1 and Table S2). To avoid this misunderstanding, which would indeed weaken our work, we have now moved both the figure and the table in the main text to the updated Figure 2 and the new Table 2.

      Regarding the robustness of the sampling, we believe that Table 3 (previously Table 2) clearly shows the statistical convergence of the data, diNerences in the presentation of the free energy are purely interpolation issues. The chi-squares of each replicate are reported in Table 2 (previously Table S2).

      (3) There is insuHicient discussion about SAXS computations from MD trajectories. The accuracy of these calculations is crucial to deriving the existing conclusions, and the study's reliance on the PLUMED plugin, which is known to give inaccurate results for SAXS computations, raises concerns. How the solvent is treated in the SAXS computations needs to be explained. Alternative methods like WAXSiS or Crysol should be explored to check whether the SAXS profiles derived from the MD trajectory are consistent across other SAXS computation methods for the major conformers of the proteins.

      We have now clarified that while the SAXS calculation to perform Metainference MD were done using PLUMED (that to our knowledge is as accurate as crysol) SAXS curves used for analysis were calculated using crysol.

      (4) The HDX and MD results do not seem to correlate well, and there is a disconnect between Figure 2 (SAXS profiles) and Figure 5 (HDX structural interpretation). The authors should quantitatively assess residue-level dynamics by comparing HDX signals with MD-derived HDX signals for each protein. This would provide a cross-validation between the experimental and computational data.

      In our opinion our SAXS, MD and HDX MS data provide a consistent picture. Our HDX-MS do not provide per residue data, making a quantitative comparison out of scope. RMSF data do not necessarily need to correlate with the deuterium uptake.

      (5) MD simulations are only used to refine the structure of AlphaFold predictions, but the trajectories could help explain why these structures diHer, what stabilizes the dimer, or what leads to the conformational transition of the H state. A lack of analysis regarding the physical mechanism behind these structural changes is a weakness of the study. The authors should dedicate more eHort to analyzing their data and provide physical insights into why these changes are observed.

      Our aim was to identify a property that could discriminate between AL and MM LCs. We used MD simulations, not to refine structures, but to explore the conformational dynamics of LCs (starting from either X-ray structures, homology or AlphaFold models), because SAXS data suggested that conformational dynamics could discriminate between AL- and MM-LCs. Simulations allowed us to propose a hypothesis, which we tested by HDX MS. While more insight is always welcome, we believe that we have achieved our goal for now. In the discussion, we present additional analysis of the simulations to connect with previous literature, we agree that more analysis can be done, and also for this reason, all our data are publicly available.

      Minor concerns

      (6) The abstract leans heavily on describing the problem and methods but lacks a clear presentation of key results. Providing a concise summary of the main findings (e.g., the identification of the H state) would better balance the abstract.

      We agree with the reviewer and we rewrote the abstract.

      (7) In the abstract, the term "experimental structure" is used ambiguously. Since SAXS also provides an experimental structure, it is unclear what the authors are referring to. This should be clarified.

      We agree with the reviewer and we rewrote the abstract.

      (8) Abbreviations such as VL (variable domain) and CL (constant domain) are not defined, making it harder for readers unfamiliar with the field to follow. Abbreviations should be defined when first mentioned.

      We agree with the reviewer and we rewrote the abstract.

      (9) The introduction provides a good general context but fails to explicitly define the knowledge gap. Specifically, the structural and dynamic determinants of LC amyloidogenicity are not well established, and this study could be framed as addressing that gap.

      We thank the reviewer and we agree this could be better framed, we improved the introduction accordingly.

      (10) The introduction does not present the novel discovery of the H state early enough. The unique contribution of identifying this state as a marker for AL-LCs should be mentioned upfront to guide the reader through the significance of the study.

      We thank the reviewer and we have now made more explicit what we found.

      (11) The therapeutic implications of this research should be highlighted more clearly in the discussion. Examples of how these findings could be utilized in drug design or therapeutic approaches would enhance the study's impact.

      We thank the reviewer, but while we think that the H-state could be targeted for drug design, since we do not have data yet we do not want to stress this point more than what we are already doing.

      (12) There is an overwhelming use of abbreviations such as H3, H7, H18, M7, and M10 without proper introduction. This makes it diHicult for readers to follow the results, and the average reader may become lost in the details. An introductory figure summarizing the sequences under study, along with a schematic of the dimeric structure defining VL and CL domains, would significantly aid comprehension.

      We agree and we tried to better introduce the systems and simplify the language without adding a figure that we think would be redundant.

      (13) In Figure 1, add labels to each SAXS curve to indicate which protein they correspond to. Also, what does online SEC-SAXS mean?

      Done

      (14) The caption of Figure 3 is unclear, particularly with abbreviations like Lb, Ls, G, and H, which are not mentioned in the captions. The authors should define these terms for clarity.

      Done

      (15) The study claims that the dominant structure of the dimer changes between diHerent LCs. However, Figure 5 shows identical structures for all proteins, raising questions about the consistency between the SAXS and HDX data. This inconsistency is a general problem between the MD and HDX sections, where cross-communication and comparisons are not properly addressed.

      We do not claim that the dominant structure of the dimer changes between diNerent LCs, this would also be in contradiction with current literature. We claim a diNerence in a low-populated state. From this point of view using always the same structure is consistent and should simplify the representation of the results. We agree that the manuscript may be not always easy to follow and we thank the reviewer in helping us improving it.

      (16) The authors show I(q) vs q and residuals for each protein. The Kratky plots are not suHicient to compare the SAXS computations with the measured profile.

      Showing Kratky and residuals is a standard and complementary way to present and compare SAXS data to structures. Chi-square values are also reported. Log-log plots have been added to SI in response to previous comments.

      (17) The authors need to explain how they estimate the Rg values (from simulation or SAXS profiles). If they are using simulations, they should compute the Rg values from the simulations for comparison.

      Rg values reported in Table 1 are derived from SAXS. Rg from simulations have been added in Table 2.

      (18) The evolution of the sampling is unclear. The authors need to show the initial starting conformation in each case and the most likely conformation after M&M in the SI, to demonstrate that their approach indeed caused changes in the initial predictions.

      Our approach is not structure refinement and as such the proposed analysis would be misleading. Metainference is meant to generate a statistical ensemble representing the equilibrium conformations that as whole reproduce the data. DiNerences (or not) between initial and selected configurations will not be particularly informative in this context.

      (19) The authors should also provide a running average of chi-squared values over time to demonstrate that the conformational ensemble converged toward the SAXS profile.

      Our simulations are not driven to improve the agreement with SAXS over time, this is not structure refinement. Metainference is meant to generate a statistical ensemble representing the equilibrium conformations that as whole reproduce the data. The suggested analysis would be a misinterpretation of our simulations. The comparison with SAXS is provided in Figure 2 and Table 2 as mentioned above.

      (20) The aggregate simulation time of 120 microseconds is misleading, as each replica was only run for 2-3 microseconds. This should be clarified.

      The number reported in the text is accurate and represent the aggregated sampling. The number of replicas for each metainference simulation and their length is reported in Table 2 now moved for clarity from the SI to main text.

      (21) It is not clear how the replicas were weighted to compute the SAXS profiles and FES. There are two independent runs in each case, and each run has about 30 replicas. How these replicas are weighted needs to be discussed in the SI.

      Done

      (22) The methods section is unevenly distributed, with detailed explanations of LC production and purification, while other key methodologies like SAXS+MD integration and HDX are not even mentioned in the main text (they are in the Supporting Information). The authors should provide a brief overview of all methodologies in the main text or move everything to the SI for consistency.

      We agree with the reviewer, all methods are now in main text.

      Reviewer #2 (Recommendations for the authors):

      (1) Computational M&M evidence is strong (Figure 3) and is supported by SAXS (used as restraints). However, Kratky plots reported in the main MS Figure 1 show significant diHerences between the data and the structural model only for one protein, AL-55. It is hard for the general reader to see how these SAXS data support a clear diHerence between AL and non-AL proteins. If possible, please strengthen the evidence; if not, soften the conclusions.

      We thank the reviewer for the comments. The chi-square (Table 1) and the residuals (Figure 1) are a strong indication of the diNerence. To strengthen the evidence, following also the comment from reviewer 3 we calculated the p-value (<10<sup>-5</sup>) on the significance of the radius of gyration to discriminate AL and MM LCs. We agree that SAXS alone was not enough and this is indeed what prompted us to perform MD simulations.

      (2) HDX MS results are cursory and not very convincing as presented. The butterfly plots in Figure 5 are too small to read and are unlabeled so it is unclear which protein is which.

      Figure 5 has been reworked for readability. More data have been added in SI.

      (3) What labeling time was selected to construct these plots and why?

      The deuterium uptakes at 30 min HDX time showed the most pronounced diNerences between diNerent proteins, which were chosen to illustrate the key structural features in the main figure panel (Figure 5).

      How diHerent are the results at other labeling times? Showing uptake curves (with errors) for more than just two peptides in the supplement Figure S12 might be helpful.

      We found a continuous increase in deuterium uptake as we increased the exchange time from 0.5 to 240 min, which reached saturation at 120 min. Therefore, the exchange follows the same pattern at all time points. Butterfly plots at diNerent HDX times of 0.5 to 240 min are shown in gradient of light blue to dark blue which clearly shows the pattern of deuterium uptake at increasing incubation times (Figure 5). The HDX uptake kinetics of selected peptides with corresponding error bars are shown in Figure S12.

      How redundant are the data, i.e. how good is the peptide coverage/resolution in key regions at the domain-domain interface that the authors deem important? Mapping the maximal deuterium uptake on the structures in Figure 5 is not very helpful. Perhaps mapping the whole range of uptake using a gradient color scheme would be more informative.

      Overall coverage and redundancy for all four proteins are> 90% and > 4.0, respectively, with an average error margin in fractional uptake among all peptides is 0.04-0.05 Da, which suggests that our data is reliable (Table S3). We modified the main panel figures showing the gradient of deuterium uptake in blue-white-red for 0 to 30% of deuterium uptake on the chain A of the dimeric LCs.

      (3) Is the conformational heterogeneity depicted in M&M simulations consistent with HDX results? The authors may want to address this by looking at the EX1/EX2 exchange kinetics for AL vs. non-AL proteins. Do AL proteins show more EX1?

      No, we don’t see any EX1 exchange kinetics in our analysis. This is compatible with the prediction of the H-state that is a native like state and not an unfolded/partially folded state.

      (4) Perhaps the main conclusion could be softened given the small number of proteins (six), esp. since only four (3 AL and 1 non-AL) could be explored by HDX. Are other HDX MS data of AL LCs from the same Lambda6 family (e.g. PMID: 34678302) consistent with the conclusions that a particular domain-domain interface is weakened in AL vs. non-AL LCs?

      We thank the reviewer for this suggestions. A diNerence in HDX MS data is indeed visible between AL and MM proteins for peptide 33-47 in the suggested paper (Figures 4, S5 and S8). The diNerence is reduced by the mutation identified in the paper as driving the aggregation in that specific case. We now mention this in the discussion.

      (5) Please clarify if the H* state is the same for a covalent vs. non-covalent LC dimer.

      We do not know because our data are only for covalent dimers. But, interestingly, the state is very similar to what was observed for a model kappa light-chain in Weber, et al., we have better highlighted this point in the discussion.

      (6) Please try and better explain why a smaller distance between CL domains in H7 protein and a larger distance in other AL proteins both promote protein misfolding.

      We do not have elements to discuss this point in more detail.

      (7) Please comment on the Kratky plots data vs. model agreement (see comments above).

      Done.

      (8) Please find a better way to display, describe, and interpret the HD exchange MS data.

      We have generated new main text (new Figure 5) and SI figures that we think allow the reader to better appreciated our observations. Corresponding results sections have been also improved.

      Minor points:

      (9) Is the population of the H-state with perturbed CL-CL domain interface, which was obtained in M&M simulations, suHicient to be observable by HDX MS?

      While populations alone are not enough to determine what is observable by HDX MS, a 10% population correspond roughly to 6 kJ/mol of ΔG and is compatible with EX2 kinetics. Previous works suggested that HDX-MS data should be sensitive to subpopulations of the order of 10%, (https://doi.org/10.1016/j.bpj.2020.02.005, https://doi.org/10.1021/jacs.2c06148)

      (10) Typically, an excited intermediate in protein unfolding is a monomer, while here it is an LC dimer. Is this unusual?

      This is a good point, we think that intermediates have mostly been studied on monomeric proteins because these are more commonly used as model systems, but we do not feel like discussing this point.

      (11) Low deuterium uptake is consistent with a rigid structure but may also reflect buried structure and/or structure that moves on a time scale greater than the labeling time.

      We agree.

      Reviewer #3 (Recommendations for the authors):

      (1) The p-value (statistical significance) of Rg diHerence should be computed.

      We thank the reviewer for the suggestion, we calculated the p-value that resulted quite significant.

      (2) The significance of mutations (SHM?) at the interface, such as A40G should be compared with previous observations. (Garrofalo et al., 2021).

      We thank the reviewer for the suggestion, a sentence has been added in the discussion.

    1. eLife Assessment

      This important work combines molecular genetics and behavioral analyses to identify inhibitory neurons in the female medial preoptic area as a neural locus that is activated following male ejaculation and whose prolonged activity plays a key role in the regulation of female sexual motivation. These experiments are rigorous and well-performed. The data are compelling and demonstrate that a subpopulation of neurons in the medial preoptic area are selectively activated following the completion of mating in females. The medial preoptic area has long been implicated as critical to sexual behavior in both sexes; however the use of a self-paced mating assay for females provides fine control over manipulating and monitoring cellular activity in this region during more naturalistic behavior. In addition, this study may act to inspire others to further explore the additional brain regions found to show upregulation of neural activity (Fos) during mating completion in females using the datasets generated here.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Ishii et al utilize a classical, but extremely understudied, female self-paced assay to directly address aspects of female sxual motivation independent from the male's behavior. This allowed for clear separation of appetitive and consummatory events, of which whole brain unbiased activity was mapped. Mating completion in females was then focused to the medial preoptic nucleus where the authors performed a rigorous set of single-cell GCaMP recordings in populations marked by Vglut2 and Vgat, finding the latter display stronger and prolonged activity after the onset of mating completion. Finally, they demonstrate function to these Fos-TRAPPED completion cells demonstrating their capacity to suppress female sexual behavior.

      Strengths:

      This manuscript sought to explicitly explore female mating drive as dictated by the female, a very rare angle for those studying mating behavior which almost always is controlled by the male's behavior. To achieve this, the authors went back to old literature and modified a classical paradigm in which measurable approach and avoidance of male conspecifics can be measured in female mice using a self-paced mating assay. Strengths include a detailed quantification of female behaviors demonstrating a robust attenuated sexual motivation in females after mating completion. To determine the neural basis behind this, a brain wide analysis of cells responding to mating completion in the female brain was conducted which revealed numerous anatomical regions displaying increased Fos activity, including the MPOA, of which the authors concentrated the remaining of their study. Employing microendoscopic imaging, the authors discovered that this mating completion signal was strongly represented in the MPOA. The single cell data analyses are of very high quality as is the number of individual cells resolved. While they identified both excitatory and inhibitory cell types that were activated by mating completion, they found the latter exhibited stronger and more persistent activity. Segmentation into individual mating behaviors reinforced the importance of GABAergic completion cells, which display prolonged activity late after the onset of mating completion. This information provides a potential mechanism for how female mice suppress further mating activity following completion. The authors then definitively demonstrate this function by TRAP'ping completion cells with chemogenetic actuators and show that CNO-induced activation of these cells specifically and strongly suppresses female sexul behavior. All experiments were extremely well-designed and performed carefully and expertly with the necessary controls solidifying the conclusions.

      Weaknesses:

      While there are no glaring weaknesses in this study, it should be noted that a great deal of literature has pinpointed the MPOA (and specifically inhibitory cells in this area) as being critical to sexual behavior, including female mating. However, no study to my knowledge has explored self-paced female mating with such fine control over manipulating and monitoring cellular activity in this region. In addition, this study may act to inspire others to further explore the additional brain regions found to show upregulation of neural activity (Fos) during mating completion in the female using the data sets generated here.

      Comments on revisions: The data has been provided in a public database.

    3. Reviewer #2 (Public review):

      Summary:

      In this set of studies, authors identify cFos activation in neurons in female mice that mated with males, and after experiencing male sexual behavior that is either restricted to appetitive behavior or including ejaculation. The medial preoptic nucleus was identified as an area with high cFos induction following ejaculation. Characterization of neurochemical phenotypes of cfos-expressing neurons showed a heterogenous distribution of activated neurons in the MPOA, including both inhibitory and excitatory cell types. Next, in vivo calcium imaging was used to show activation of Vgat and Vglut neurons in female mice MPOA after displaying sniffing of the male, experiencing male appetitive, or male consummatory sexual behavior, demonstrating significantly higher activation and of a greater subpopulation of Vgat neurons than Vglut neurons. Moreover, greatest activation of Vgat neurons was detected following experiencing ejaculation, and ejaculation activated different subpopulations of MPOA cells than consummatory or appetitive sexual behaviors experienced by the female. Finally, pharmaco-genetic activation of the subpopulation of MPOA neurons that were previously activated following ejaculation resulted in a significant reduction of approach behavior by the female mice towards the male, interpreted as suppression of female sexual motivation. In conclusion, a subpopulation of inhibitory cells in the MPOA is activated in female mice after experiencing ejaculation, in turn contributing to suppression of sexual approach behavior.

      Strengths:

      The current set of studies replicates previous findings that ejaculation causes longer latencies to initiate interactions with a male after receiving an ejaculation in a paced mating paradigm, which is widely validated and extensively used to investigate sexual behavior in female rodents. Studies also confirm that ejaculation increases cFos expression in the MPOA, while extending prior findings with a careful analysis of the neurochemical phenotype of activated neurons. A major strength of the studies is the use of cell-specific in vivo imaging and pharmaco-genetic activation to reveal a functional role of specific neuronal ensemble within the MPOA for post ejaculatory female sexual behavior.

      Weaknesses:

      The authors include an elegant manipulation of ejaculation-activated neurons in the MPOA using DREADD. However, this study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have examined if activation of the cells was required. Moreover, additional tests for sexual motivation would have greatly strengthened the overall conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      Ishii et al used molecular genetics, behavioral analyses, in vivo neural activity imaging, and neural activity manipulations in mice to study the functional role of a subset of medial preoptic area (MPOA) neurons in the regulation of female sexual drive. They first employed a self-paced mating assay during which a female could control the amount of interaction time with a male to assess female sexual drive after completion of mating. The authors observed that after mating completion (i.e., male ejaculation) females spend significantly less time interacting with males, indicating that their sexual drive is reduced. Next, the authors performed a brain-wide analysis of neurons activated following male ejaculation and identified the MPOA as a strong candidate region. One caveat is that the activity labeling was not exclusive to neurons activated following male ejaculation but included all neurons activated before, during, and after the mating encounter. However, in this revised version of the manuscript, the authors have included a key control group that labels all neurons activated up to but not including male ejaculation. Comparison of the number of activated neurons in these two groups revealed a significant additional set of neurons in the female MPOA following ejaculation. Importantly, the authors also provided in vivo calcium imaging data showing that a subset of MPOA neurons responds significantly and specifically to male ejaculation and not other behaviors during the social encounter. The authors performed these studies in both excitatory and inhibitory populations of the MPOA. Their analysis identified a subpopulation of inhibitory neurons that exhibit sustained increased activity for 90 sec following male ejaculation. Finally, the authors used chemogenetics to activate MPOA neurons during home cage mating, condition place preference, pup retrieval, and the self-paced mating assay. They found that activation of female MPOA neurons that were previously activated following male ejaculation significantly reduces mating behaviors and time spent interacting with a male during the self-paced mating assay. Whereas, activation of female MPOA neurons that were previously activated during consummatory behaviors but not male ejaculation does not alter mating behaviors and time spent interacting with a male. Therefore, MPOA neurons activated following ejaculation are sufficient to suppress female sexual motivation.

      The authors' experimental execution is rigorous and well performed. Their data identify inhibitory neurons in the female MPOA as a neural locus that is activated following male ejaculation and whose prolonged activity plays a key role in the regulation of female sexual motivation. The addition of some key control groups to this revised version of the manuscript greatly strengthens the interpretation of the authors' findings.

      Strengths:

      (1) The use of the self-paced mating assay in combination with neural imaging and manipulation to assess female sexual drive is innovative. The authors correctly assert that relatively little is known about how male ejaculation affects sexual motivation in females as compared to males. Therefore, the data collected from these studies is important and valuable.

      (2) The authors provide convincing histological data and analyses to verify and validate their brain-wide activity labeling, neural imaging, and chemogenetic studies.

      (3) The single cell in vivo calcium imaging data are well performed and analyzed. They provide key insights into the activity profiles of both excitatory and inhibitory neurons in the female MPOA during mating encounters. The authors identification of an inhibitory subpopulation of female MPOA neurons that is selectively activated following completion of mating is fundamental for future experiments which could potentially find a molecular marker for this population and specifically manipulate these neurons to understand their role in female sexual motivation in greater detail.

      (4) The authors provide convincing evidence that activation of female MPOA neurons activated following male ejaculation is sufficient to suppress female sexual motivation. Importantly, the authors addition of the consummatory-hM3Dq group demonstrates that activation of female MPOA neurons activated during mating behaviors prior to male ejaculation is not sufficient to suppress female sexual motivation.

      Weaknesses:

      In this revised version of the manuscript, the authors have added important controls as well as additional clarifying text that adequately address the weaknesses that were present in the original version of the manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      […] Weaknesses:

      While there are no glaring weaknesses in this study, it should be noted that a great deal of literature has pinpointed the MPOA (and specifically inhibitory cells in this area) as being critical to sexual behavior, including female mating. However, no study to my knowledge has explored self-paced female mating with such fine control over manipulating and monitoring cellular activity in this region. In addition, this study may act to inspire others to further explore the additional brain regions found to show upregulation of neural activity (Fos) during mating completion in the female using the data sets generated here.

      Reviewer #2 (Public Review):

      […] Weaknesses:

      The authors include an elegant manipulation of ejaculation-activated neurons in the MPOA using DREADD. However, this study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have examined if activation of the cells was required. Moreover, additional tests for sexual motivation would have greatly strengthened the overall conclusions.

      Reviewer #3 (Public Review):

      […] Weaknesses:

      (1) Their activity-dependent labeling strategy is not exclusive to mating completion but instead includes all neurons active before, during, and after the social encounter. In the manuscript, the authors did not discuss the time course of Fos activation or the timeframe of the FosTRAP labeling strategy. Fos continues to be expressed and is detectable for hours following neural activation. Therefore, the FosTRAP strategy also labels neurons that were activated 3 hours before the injection of 4-OHT. The original FosTRAP2 paper which is cited in this manuscript (DeNardo et al, 2019) performed a detailed analysis of the labeling window in Supplementary Figure 2 of that paper. Here is quoted text from that paper: "Resultant patterns of tdTomato expression revealed that the majority of TRAPing occurred within a 6-hour window centered around the 4-OHT injection." Thus, the FosTRAP "mating completion" groups throughout this manuscript also include neurons activated 3 hours before mating completion, which includes neurons activated during appetitive and consummatory mating behaviors.

      This makes all of the FosTRAP data very difficult to interpret. Compounding this is the issue that the two groups the authors compare in their experiments are females administered 4-OHT following appetitive investigation behaviors (with the male removed before mating behaviors occurred) and females administered 4-OHT following mating completion. The "appetitive" group labeled neurons activated only during appetitive investigation, but the "completion" group labeled neurons activated during appetitive investigations, consummatory mating bouts, and mating completion. Therefore, in the brain-wide analysis of Figure 2, it is impossible to identify brain regions that were activated exclusively by mating completion and not by consummatory mating behaviors. This could have been achieved if the "completion" group was compared to a group of females that had commenced consummatory mating behaviors but were separated from the male before mating was completed. Then, any neurons labeled by the "completion" FosTRAP but not the "consummatory" FosTRAP would be neurons specifically activated by mating completion. In the current brain-wide analysis experiments, neurons activated by consummatory behaviors and mating completion can not be disassociated.

      This same issue is present in the interpretation of the chemogenetic activation data in Figure 6. In the experiments of Figure 6, the authors are activating neurons naturally activated during consummatory mating behaviors as well as those activated during mating completion.

      We appreciate the reviewers comments and concerns about the TRAP method.

      First, we agree that the FosTRAP method does not have the sensitivity to separate ensembles that happen within a short time window. From our preliminary results, we have observed that the cells that inject 4-OHT after mating completion induce more tdTomato cells in the MPN than injection after appetitive behavior or consummatory behavior (Author response image 1).

      To further compare the difference between the “consummatory” and “completion” ensemble, we included an additional cohort where we TRAP cells responding to consummatory behavior. This cohort is added to Figure 2, 6, S3, S4, S9, S10 and S11. From the whole brain mapping of TRAP cells, we found that many hypothalamic and extended amygdala areas including the medial preoptic area, and the bed nucleus of stria terminalis were shown to have significantly larger tdTomato+ cell density in the completion group than in the appetitive group while there was a tendency that the consummatory group also had larger cell density than the appetitive group. In the Gq-DREADD experiment, we found that the Completion-hM3Dq group but not the Consummatory-hM3Dq group showed the reduction of sexual motivation of the female mouse in the self-paced mating assay (Figure 6). The Completion-hM3Dq group but not the Consummatory-hM3Dq group also showed significantly low intromission events and tended to show lower receptivity in the home cage mating assay (Figure S10). Furthermore, post-hoc histological analysis also showed that the c-Fos+ and TRAP labeled cells in the MPN tended to be the larger in the Completion-hM3Dq group than in the Consummatory-hM3Dq group (Figure S9). These results, together with the in vivo Calcium imaging experiments in Figure 3, 4 and 5, suggests that the MPN contains male-ejaculation responsive cells that are distinct with the male-mounting responsive cells and that they are sufficient to suppress female sexual motivation.

      However, it is true that with the current state of mouse genetic tools, we do not have any methods with higher time accuracy. We have discussed the limitations of FosTRAP method regarding its low time sensitivity in the Discussion section.

      Author response image 1.

      Representative image showing TRAP labeling in the MPN after mating completion and intromission

      (2) This study does not definitively show that the female mice used in this study display decreased sexual motivation after the completion of mating. The females exhibit reduced interaction with males that had also just completed mating, but it is unclear if the females would continue to show reduced interaction time if given the choice to interact with a male that was not in the post-ejaculatory refractory period. Perhaps, these females have a natural preference to interact more with sexually motivated males compared to recently mated (not sexually motivated) males. To definitively show that these females exhibit decreased sexual motivation the authors should perform two control experiments: 1) provide the females with access to a fully sexually motivated male after the females have completed mating with a different male to see if interaction time changes, and 2) compare interaction time toward mated and non-mated males using the self-paced mating assay. These controls would show that the reduction in the interaction time is because the females have reduced sexual motivation and not because these females just naturally interact with sexually motivated males more than males in the post-ejaculatory refractory period.

      We highly appreciate the reviewers comments regarding the interpretation of the self-paced mating assay. To address the concerns, we added an experiment where the female subjects were introduced to a novel sexually motivated male mice in the self-paced mating assay immediately after receiving ejaculation (Figure S2). As result, we found that similar to the self-paced mating assay using the same male animal, the female subject spends significantly more time in the isolation zone on the post-ejaculation day when compared to the pre-ejaculation day.

      (3) It is unclear how the transient 90-second response of these MPOA neurons following the completion of mating causes the prolonged reduction in female sexual motivation that is at the minutes to hours timeframe. No molecular or cellular mechanism is discussed.

      (4) The authors discuss potential cell types and neural population markers within the MPOA and go into some detail in Figure S3. However, their experiments are performed with only the larger excitatory and inhibitory MPOA neural populations.

      While the molecular or cellular mechanism of prolonged activity of MPOA neurons is  critical to understand the neural mechanism of how sustained neural activity in the MPOA suppress female sexual motivation, it is out of the reach of the current manuscript and a subject of future studies. We have added a section in the discussion part to further discuss the potential molecular mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      If the authors haven't already, it would be useful if the authors could make the brain-wide analysis of Fos activity publicly available.

      We have distributed the data to https://dandiarchive.org/

      I would also make sure the n's are included in each Figure Legend for each panel (some are missing in the Supplementals).

      We appreciate the comment, we have added the number of subjects to Figure 3, 4, 5.

      It would also be best to provide clearer labels to some of the Figures, for example, Figure 5D, the Types should also be labeled with what behaviors they correspond to.

      We appreciate the comment. Figure 5 is focused on post-ejaculation neural activity. The cell types are categorized based neural activity after experiencing male ejaculation, it does not correspond to any behaviors.

      Reviewer #2 (Recommendations For The Authors):

      (1) A first recommendation is to replace the use of the term "mating completion" with "ejaculation". Male and female rodents display a period of reduced approach behavior following display or experiencing ejaculation, which is referred to as the post-ejaculatory interval. The current studies investigate the neural ensemble that contributes to this post-ejaculatory interval in female mice. In addition, male and female rodents will display a prolonged period of sexual inactivity referred to as satiety, which is typically observed after repeated display or experience of ejaculations. The current studies do not investigate satiety. Moreover, in the current studies, female mice appeared to display approach behavior (time in the interaction zone) even within the 10 minutes following experiencing ejaculation (Fig 1F). Hence, the term "completion" is not accurate and should be replaced by "ejaculation" in all figures and throughout the manuscript. Replacing completion with ejaculation will also clarify what defines "onset of completion", which this reviewer assumes refers to the onset of ejaculatory behavior observed in the male.

      Thank you for the comment. We agree that the mating completion was inappropriate. We have changed the wording to ejaculation or post-ejaculatory period.

      (2) Likewise, a variety of other terms and descriptions need to be adjusted for consistency and accuracy. For example, "room" when referring to the interaction or isolation zones; "onset of mating completion" when referring to ejaculation; "male intruder" to refer to the introduction of the male mating partner, but using a term typically used for an intruder-resident aggression test. Replacing these terms will aid in reducing confusion for the reader and more accurately describe the behavioral parameters.

      We appreciate the comment. We have updated the terms “male intruder” to “partner”, “room” to “area” or “zone”.

      (3) The use of the paced mating paradigm is a strength of these studies. This paradigm has been widely used and validated to study female sexual behavior in rodents. Please refer to recent reviews and landmark papers using this paradigm in addition to the current cited papers to better reflect the vast wealth of studies that previously reported the behavioral data that were replicated in this study.

      We have added a section discussing the self-paced mating assay, its merits and caveats P8.

      (4) In the paced mating test, females can pace the receipt of sexual stimulation, and latencies to withdraw and return to the male-containing chamber are considered indicators of sexual motivation. Female withdrawal will increase with the intensity of the sexual stimulation and latency to return is longer following ejaculation. Paced mating is thus a balance of approach and withdrawal behaviors that increases reward and likelihood of pregnancy for females. Moreover, ejaculation-induced withdrawal and longer latencies to return and approach are altered by hormonal status and by the introduction of a novel male partner. Thus, female sexual behavior is complex and withdrawal behavior (in this paper measured as time spent in an isolation zone) needs to be interpreted with caution and not simply referred to as sexual motivation. I recommend expanding the description of the paradigm to highlight the strengths and limitations of this paradigm and use caution to interpret time spent in the isolation zone as a lack of sexual motivation. I also recommend referring to the period after ejaculation as the post-ejaculatory interval (instead of completion).

      Thank you for the comment. We have changed the wording in the manuscript to adjust the way it refers to sexual motivation.

      (5) In the current paper, time in the isolation zone and the number of transitions are used as the behavioral measures. Latencies, which are typically included in paced mating studies, were missing from the data. If data are available for latencies to withdraw and return to the interaction zone after mount, intromission, and ejaculation, please add these data. If such data were not collected or are not available, please recognize this caveat.

      Thank you for the comment. In figure 1, which all animals did experience male ejaculation, we added latency analysis (Figure 1I and 1P). The result indicates as suggested in the literature, female mice took significantly longer to return the interaction zone after male-ejaculation.

      (6) The brain-wide mapping study of cFos expression after ejaculation confirms and extends prior findings, mostly in rats. Please reference prior papers in female rodents showing cFos after ejaculation and discuss how the current data replicate or differ from prior data.

      In the manuscript P8 L351, we have referred to Pfaus et al., 1993 to discuss the similarity in the c-Fos expression pattern studied in rats. We have further added descriptions to emphasize the similarity between the two datasets.

      (7) A paragraph describing the specific cell types that are activated in the MPOA is an essential part of the study and is described in detail, but only shown in supplementary figures. Given the emphasis on this particular part of the study, a recommendation is to incorporate these data as a regular figure instead of supplementary material.

      While we greatly appreciate the comment, we consider that the molecular characterization of MPOA neurons are not the main focus  of the paper and decided to keep it in the supplementary figure.

      (8) Calcium imaging studies were performed in the home cage for obvious practical reasons. However, in the home cage testing, the females withdraw from the males using a different approach and do not exit an interaction zone through a division. There may also be differences in the male sexual behavior patterns and thus the stimulation that females receive from the male. Yet, it appears that ejaculation induces similar patterns of neural activation in this paradigm. Thus, it is likely that neuron activation is a result of receiving ejaculation, rather than withdraw behavior. Please briefly discuss the comparisons between the cFos and calcium imaging conclusions in these two different paradigms.

      We have added a section discussing the self-paced mating assay, its merits and caveats P8. Withdrawal and latency and its interpretation is discussed in this section.

      (9) The final study includes the manipulation of ejaculation-activated neurons in the MPOA using DREADD. This study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have shown if activation of the cells was required. Moreover, additional tests for sexual motivation, such as partner preference tests would have greatly strengthened the results since a lack of entering an interaction zone can also be explained by impaired sensory processing or locomotor behavior. Finally, CNO also appeared to impact time in the isolation zone for a subset of animals in the ejaculation (completion) control group and the appetitive group. These effects didn't reach statistical significance, but groups also had low sample sizes (n=6-7) and may thus have been underpowered. The recommendation is to include these caveats and shortcomings in the discussion of these results.

      We appreciate the comments. We first added an inhibitory approach to show the necessity of MPOA neurons. As result, we found that the inhibition of these neurons did not affect the behavior in the self-paced mating assay but increased the subjects sexual receptivity (Figure S11). For the low sample size, we have added a power analysis in the statistical section.

      (10) The studies utilized ovariectomized females with hormone priming. Since sexual receptivity in females is highly dependent on the hormonal milieu, the authors are encouraged to add an explanation of why ovariectomized females were used and if the results may have differed in cycling females.

      We appreciate the comments. The female subjects used in the TRAP experiment will be needing to experience ejaculation from the male mice twice, once to label the cells, and second during the reactivation. In order to avoid pregnancy during the first experience, we ovariectomized the female and controlled their hormonal conditions. This method has been used successfully in other sexual behavior studies (Yang et al., 2013, Ring., 1944.).  This was described in P11. We have further demonstrated in Figure 1N-T that female mice were not ovariectomized and were under the natural estrus cycle showed similar suppression of sexual interaction after the completion of mating. The manuscript was updated to discuss that the behavior change after mating completion is not dependent on the ovary.

      (11) Overall, the paper lacks references to relevant prior studies. For example, many studies have been reported over the past 2-3 decades about the effects of female rodent sexual behavior on activation in the brain and the effects of different vaginocervical stimulation on pregnancy and fertility. It is absolutely the case that much remains unknown about the complex neural circuitries that control behavior during the post-ejaculatory interval and sexual satiety in both male and female rodents, but studies have indicated roles for hypothalamic areas, bed nucleus of the stria terminals, ventral tegmental area, posterior thalamus, and prefrontal cortex. Hence, the current introduction and discussion do not adequately summarize or acknowledge these prior investigations and therefore place these new findings in the context of what was previously known.

      We appreciate the comment and added references to P2 L65, P8 L355-357 to discuss existing literature about c-Fos mapping analysis after ejaculation or genital stimulation in female rats.

      (12) Finally, sample sizes appear to be modest, ranging n=4-8 (except n=14 in the completion group in Figure S7) and vary between groups within and between studies. Please explain in the methods section how sample sizes were pre-determined and acknowledge if studies may have potentially been underpowered.

      The sample size for behavior experiments in this study were n = 6-9. This was predetermined based on previous studies examining female sexual behavior (Ishii et al. 2017, Liu et al. 2022, Yin et al. 2022). To further examine the number of animals required for our behavioral experiments, we pooled data used in this study and conducted a power analysis (n = 111 pooled data, control n = 94, stim n = 17). We conducted a power analysis using the variance calculated from pooled average time in isolation zone. These data were pooled from control animals in each experiment (eg. animals with GFP control virus injected, saline injected, etc.). The average time in isolation zone in the after ejaculation or after reactivating the completion cells was 420 ± 210 seconds, and 49 ± 91 seconds in the control group (mean ± s.d.). Within this population, we found that 5 animals were sufficient to detect the difference (p < 0.05, power = 0.8) in Students t-test. We have added this explanation in the supplemental experimental procedure, page P18, line 817-827.

      Reviewer #3 (Recommendations For The Authors):

      The authors should discuss the fact that the FosTRAP2 strategy labels neurons activated 3 hours before the 4-OHT injection. As the manuscript is written, it seems to suggest that the 4-OHT injection given following mating completion only labeled neurons activated during mating completion. This is very misleading. I respect the amount of work and rigor that went into these experiments. The single-cell imaging, implementation of the FosTRAP strategy, and behavioral analysis are all well executed. Novel insights into the neural regulation of female sexual drive can be gleaned from the neural imaging experiments. Unfortunately, the limitations of the FosTRAP strategy make those studies very difficult to interpret, and therefore, a more candid discussion and re-interpretation of the data from the FosTRAP experiments is needed.

      We appreciate the reviewers comments and concerns about the TRAP method.

      First, we agree that the FosTRAP method does not have the sensitivity to separate ensembles that happen within a short time window. From our preliminary results, we have observed that the cells that inject 4-OHT after mating completion induce more tdTomato cells in the MPN than injection after appetitive behavior or consummatory behavior (Author response image 1).

      To further compare the difference between the “consummatory” and “completion” ensemble, we included an additional cohort where we TRAP cells responding to consummatory behavior. This cohort is added to Figure 2, 6, S3, S4, S9, S10 and S11. From the whole brain mapping of TRAP cells, we found that many hypothalamic and extended amygdala areas including the medial preoptic area, and the bed nucleus of stria terminalis were shown to have significantly larger tdTomato+ cell density in the completion group than in the appetitive group while there was a tendency that the consummatory group also had larger cell density than the appetitive group. In the Gq-DREADD experiment, we found that the Completion-hM3Dq group but not the Consummatory-hM3Dq group showed the reduction of sexual motivation of the female mouse in the self-paced mating assay (Figure 6). The Completion-hM3Dq group but not the Consummatory-hM3Dq group also showed significantly low intromission events and tended to show lower receptivity in the home cage mating assay (Figure S10). Furthermore, post-hoc histological analysis also showed that the c-Fos+ and TRAP labeled cells in the MPN tended to be the larger in the Completion-hM3Dq group than in the Consummatory-hM3Dq group (Figure S9). These results, together with the in vivo Calcium imaging experiments in Figure 3, 4 and 5, suggests that the MPN contains male-ejaculation responsive cells that are distinct with the male-mounting responsive cells and that they are sufficient to suppress female sexual motivation.

      However, it is true that with the current state of mouse genetic tools, we do not have any methods with higher time accuracy. We have discussed the limitations of FosTRAP method regarding its low time sensitivity in the Discussion section.

      Editor notes:

      Should you choose to revise your manuscript, please include full statistical reporting in the main text including test statistic, degrees of freedom, an exact P value.

      Thank you for the comment. The statistical values were added to the manuscript.

    1. eLife Assessment

      Yamamoto and Matano provide convincing evidence that a G63E/R CD8+ T-cell escape mutation in the accessory viral protein Nef promote the induction of neutralizing antibody (nAb) responses in rhesus macaques infected with SIVmac239, which is usually largely resistant to neutralization. Functional analyses support that this mutation specifically impairs Nef`s ability to stimulate PI3K/Akt/mTORC2 signalling. This important study suggests that the accessory viral protein Nef impairs B cell function and effective humoral immune responses and is of interest for researchers and physicians interested in HIV/AIDS and vaccine development.

    2. Reviewer #3 (Public review):

      Human and simian immunodeficiency viruses (HIV and SIV, respectively) evolved numerous mechanisms to compromise effective immune responses but the underlying mechanisms remain incompletely understood. Here, Yamamoto and Matano examined the humoral immune response in a large number of rhesus macaques infected with the difficult-to-neutralize SIVmac239 strain and identified a subgroup of animals showing significant neutralizing Ab responses. Sequence analyses revealed that in most of these animals (7/9) but only a minority in the control group (2/19) SIVmac variants containing a CD8+ T-cell escape mutation of G63E/R in the viral Nef gene emerged. Functional analyses revealed that this change attenuates the ability of Nef to stimulate PI3K/Akt/mTORC2 signalling. The authors propose that this improved induction of SIVmac239 nAb is reciprocal to antibody dysregulation caused by a previously identified human PI3K gain-of-function mutation associated with impaired anti-viral B-cell responses. Altogether, the results suggest that PI3K signalling plays a role in B-cell maturation and generation of effective nAb responses. Preliminary data indicate that Nef might be transferred from infected T cells to B cells by direct contact. However, the exact mechanism and the relevance for vaccine development requires further studies

      The strengths of the study are that the authors analyzed a large number of SIVmac-infected macaques to unravel the biological significance of the known effect of the interaction of Nef with PI3K/Akt/mTORC2 signaling. This is interesting and may provide a novel means to improve humoral immune responses to HIV. In the revised version the authors made an effort to address previous concerns. Especially, they provide data supporting that Nef might be transferred to B cells by direct cell-cell contact. In addition, they provide some evidence that G63R that also emerged in most animals does not share the disruptive effect of G63G although experimental examination and discussion why G63R might emerge remains poor. A weakness that remains is that some effects of the G63E mutation are modest and effects were not compared to SIVmac constructs lacking Nef entirely. The evidence for a role of Nef G63E mutation on PI3K and the association with improved nAb responses is convincing and it is appreciated that the authors provide additional evidence for a potential impact of "soluble" Nef on neighboring B cells. The presentation of the experimental set-up and the results has been improved but is in part still challenging to comprehend. It seems that direct cell-cell contact is required and membranes are exchanged. Since Nef is associated with cellular membranes this might lead to some transfer of Nef to B cells. However, the immunological and functional consequences of this largely remain to be determined. Alternatively, Nef-mediated manipulation of helper CD4 T cells might also impact B cell function and effective humoral immune responses. Additional editing of the manuscript has been performed to make the results accessible to a broad readership.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Public review):

      Human and simian immunodeficiency viruses (HIV and SIV, respectively) evolved numerous mechanisms to compromise effective immune responses but the underlying mechanisms remain incompletely understood. Here, Yamamoto and Matano examined the humoral immune response in a large number of rhesus macaques infected with the difficult-to-neutralize SIVmac239 strain and identified a subgroup of animals showing significant neutralizing Ab responses. Sequence analyses revealed that in most of these animals (7/9) but only a minority in the control group (2/19) SIVmac variants containing a CD8+ T-cell escape mutation of G63E/R in the viral Nef gene emerged. Functional analyses revealed that this change attenuates the ability of Nef to stimulate PI3K/Akt/mTORC2 signalling. The authors propose that this improved induction of SIVmac239 nAb is reciprocal to antibody dysregulation caused by a previously identified human PI3K gain-of-function mutation associated with impaired anti-viral B-cell responses. Altogether, the results suggest that PI3K signalling plays a role in B-cell maturation and generation of effective nAb responses. Preliminary data indicate that Nef might be transferred from infected T cells to B cells by direct contact. However, the exact mechanism and the relevance for vaccine development requires further studies

      Strengths of the study are that the authors analyzed a large number of SIVmac-infected macaques to unravel the biological significance of the known effect of the interaction of Nef with PI3K/Akt/mTORC2 signaling. This is interesting and may provide a novel means to improve humoral immune responses to HIV. In the revised version the authors made an effort to address previous concerns. Especially, they provide data supporting that Nef might be transferred to B cells by direct cell-cell contact. In addition, the provide some evidence that G63R that also emerged in most animals does not share the disruptive effect of G63G although experimental examination and discussion why G63R might emerge remains poor. Another weakness that remains is that some effects of the G63E mutation are modest and effects were not compared to SIVmac constructs lacking Nef entirely. The evidence for a role of Nef G63E mutation on PI3K and the association with improved nAb responses was largely convincing and it is appreciated that the authors provide additional evidence for a potential impact of "soluble" Nef on neighboring B cells. However, the experimental set-up and the results are difficult to comprehend. It seems that direct cell-cell contact is required and membranes are exchanged. Since Nef is associated with cellular membranes this might lead to some transfer of Nef to B cells. However, the immunological and functional consequences of this remain largely elusive. Alternatively, Nef-mediated manipulation of helper CD4 T cells might also impact B cell function and effective humoral immune responses. As previously noted, the presentation of the results and conclusions was in part very convoluted and difficult to comprehend. While the authors made attempts to improve the writing parts of the manuscript are still challenging to follow. This applies even more to the rebuttal (complex words combined with poor grammar), which made it difficult to assess which concerns have been satisfactory addressed.

      We are grateful for the visionary comments. Based on suggestion, we have edited the writing throughout and appended remarks on certain points raised in the Discussion section. For points that need experimentation, we would like to address them in a follow-up study now under preparation.

      Reviewer #3 (Recommendations for the authors):

      Additional editing of the manuscript is highly recommended to make the results accessible for a broad readership.

      We are grateful for the important suggestion. Accordingly, we have made editing of the manuscript aimed for a broad readership.

    1. eLife Assessment

      This valuable study presents data suggesting the critical roles of two ancient proteins, XAP5 and XAP5L, in regulating the transcriptional program of ciliogenesis during mouse spermatogenesis. The supporting data are solid, and this work will be of interest to biomedical researchers studying ciliogenesis and reproduction.

    2. Reviewer #1 (Public review):

      Summary:

      Wang et al. generate XAP5 and XAP5L knockout mice and find that they are male infertile due to spermatogonial/meiotic arrest and reduced sperm motility, respectively. CUT & Tag data were added in this revision in order to support that XAP5 and XAP5L are antagonistic transcription factors of cilliogenesis.

      Strengths:

      Knockout mouse models provided strong evidence to indicate that XAP5 and XAP5L are critical for spermatogenesis. RNA-seq and CUT & Tag are valuable sources to further explore their molecular mechanisms.

      Weaknesses:

      The authors claim that XAP5 and XAP5L transcriptionally regulate sperm flagella development; however, expression, physiological role, and molecular evidence do not well support this concept. This reviewer still thinks the physiological roles of XAP5 and XAP5l are different. (i) XAP5 is expressed at spermatogonia within testes while XAP5l is localized at round/elongating spermatids (their expressions are different). (ii) Spermatogenesis was arrested at spermatogonia/early spermatocyte stage in Xap5-KO mice while sperm abnormalities were observed in Xap5l-KO mice (their roles are different). This reviewer still can't get the authors' viewpoint that XAP5 and XAP5l are 'antagonistic relationship' to regulate sperm flagella development. RNA-seq and CUT & Tag data are valuable source; however, this reviewer suggests exploring how XAP5 regulates spermatogonia differentiation and how XAP5l regulates sperm flagella motility.

    3. Reviewer #2 (Public review):

      In this study, Wang et al., report the significance of XAP5L and XAP5 in spermatogenesis which are involved in transcriptional regulation of the ciliary gene in testes. In a previous study, the authors demonstrated that XAP5 is a transcription factor required for flagellar assembly in Chlamydomonas. Continuing from their previous study, the authors examined conserved role of the XAP5 and XAP5L, which are the orthologue pair in mammals.

      XAP5 and XAP5L express ubiquitously and testis specifically, respectively, and their absence in testes causes male infertility with defective spermatogenesis. Interestingly, XAP5 deficiency arrest germ cell development at pachytene stage, whereas XAP5L absence causes impaired flagellar formation. RNA-seq analyses demonstrated that XAP5 deficiency suppresses ciliary gene expression including Foxj1 and Rfx family genes in early testis. By contrast, XAP5L deficiency abnormally remains Foxj1 and Rfx genes in mature sperm. From the results, the authors conclude that XAP5 and XAP5L are the antagonistic transcription factor to function at the upstream of Foxj1 and Rfx family genes.

      The current version of the manuscript well represents this reviewer's initial concerns and supports author's claim. Key transcription factors for ciliogenesis, Foxj1 and Rfx2, are direct downstream targets for XAP5 and XAP5L and their common motifs well explain their antagonistic function in sperm flagellar development. All the results well demonstrate that ancient transcription factors, XAP5 and XAP5L, are upstream transcription factors to modulate flagellar development in male mammalian germ line.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang et al. generate XAP5 and XAP5L knockout mice and find that they are male infertile due to meiotic arrest and reduced sperm motility, respectively. RNA-Seq was subsequently performed and the authors concluded that XAP5 and XAP5L are antagonistic transcription factors of cilliogenesis (in XAP5-KO P16 testis: 554 genes were unregulated and 1587 genes were downregulated; in XAP5L-KO sperm: 2093 genes were unregulated and 267 genes were downregulated).

      We are grateful for the comprehensive summary.

      Strengths:

      Knockout mouse models provided strong evidence to indicate that XAP5 and XAP5L are critical for spermatogenesis and male fertility.

      Thank you for your positive comment.

      Weaknesses:

      The key conclusions are not supported by evidence. First, the authors claim that XAP5 and XAP5L transcriptionally regulate sperm flagella development; however, detailed molecular experiments related to transcription regulation are lacking. How do XAP5 and XAP5L regulate their targets? Only RNA-Seq is not enough. Second, the authors declare that XAP5 and XAP5L are antagonistic transcription factors; however, how do XAP5 and XAP5L regulate sperm flagella development antagonistically? Only RNA-Seq is not enough. Third, I am concerned about whether XAP5 really regulates sperm flagella development. XAP5 is specifically expressed in spermatogonia and XAP5-cKO mice are in meiotic arrest, indicating that XAP5 regulates meiosis rather than sperm flagella development.

      Thank you for the critical comments. To strengthen our conclusions, we have included XAP5/XAP5L CUT&Tag data in our revised manuscript. This highly sensitive method has allowed us to identify direct target genes of XAP5 and XAP5L (Table S1, Figure S6). Notably, our results demonstrate that both FOXJ1 and RFX2 are occupied by XAP5 (Figure 4G). Additionally, real-time PCR validation confirmed that RFX2 is also associated with XAP5L, even though enriched peaks for the RFX2 gene were not detected in the initial CUT&Tag data (Figure 4G). These findings indicate that XAP5 and XAP5L regulate the expression of FOXJ1 and RFX2 by directly binding to these genes. De novo motif analyses revealed that XAP5 and XAP5L shared a conserved binding sequence (CCCCGCCC/GGGCGGGG) (Figure S6C), and the bound regions of FOXJ1 and RFX2 contain this sequence. Further analysis shows that many XAP5L target genes are also targets of XAP5 (Figure S6G), despite the limited number of identified XAP5L target genes. This differential binding and regulation of shared target genes underscore the antagonistic relationship between XAP5 and XAP5L. Collectively, these findings provide additional support for the idea that XAP5 and XAP5L function as antagonistic transcription factors, acting upstream of transcription factor families, including FOXJ1 and RFX factors, to coordinate ciliogenesis during spermatogenesis.

      While we agree that XAP5 primarily regulates meiosis during spermatogenesis, our data also indicate that many cilia-related genes, including key transcription regulators of spermiogenesis such as RFX2 and SOX30, are downregulated in XAP5-cKO mice and are bound by XAP5 (Figure 4, Figures S4 and S6). It is important to note that genes coding for flagella components are expressed sequentially and in a germ cell-specific manner during development. When we refer to "regulating sperm flagella development", we mean the spatiotemporal regulation. We have revised the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In this study, Wang et al., report the significance of XAP5L and XAP5 in spermatogenesis, involved in transcriptional regulation of the ciliary gene in testes. In previous studies, the authors demonstrate that XAP5 is a transcription factor required for flagellar assembly in Chlamydomonas. Continuing from their previous study, the authors examine the conserved role of the XAP5 and XAP5L, which are the orthologue pair in mammals.

      XAP5 and XAP5L express ubiquitously and testis specifically, respectively, and their absence in the testes causes male infertility with defective spermatogenesis. Interestingly, XAP5 deficiency arrests germ cell development at the pachytene stage, whereas XAP5L absence causes impaired flagellar formation. RNA-seq analyses demonstrated that XAP5 deficiency suppresses ciliary gene expression including Foxj1 and Rfx family genes in early testis. By contrast, XAP5L deficiency abnormally remains Foxj1 and Rfx genes in mature sperm. From the results, the authors conclude that XAP5 and XAP5L are the antagonistic transcription factors that function upstream of Foxj1 and Rfx family genes.

      This reviewer thinks the overall experiments are performed well and that the manuscript is clear. However, the current results do not directly support the authors' conclusion. For example, the transcriptional function of XAP5 and XAP5L requires more evidence. In addition, this reviewer wonders about the conserved XAP5 function of ciliary/flagellar gene transcription in mammals - the gene is ubiquitously expressed despite its functional importance in flagellar assembly in Chlamydomonas. Thus, this reviewer thinks authors are required to show more direct evidence to clearly support their conclusion with more descriptions of its role in ciliary/flagellar assembly.

      Thank you for your thoughtful review of our work. We appreciate your positive feedback on the overall quality of the experiments and the clarity of the manuscript. In response to your concerns, we have included new experimental data and made revisions to the manuscript (lines 193-217) to better support our conclusions, particularly regarding the transcriptional function of XAP5 and XAP5L. Additionally, we have expanded on the role of XAP5 in ciliary and flagellar assembly to provide more direct evidence for its functional importance. Thank you for your insights.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The title (Control of ciliary transcriptional programs during spermatogenesis by antagonistic transcription factors) is not specific and does tend to exaggerate.

      Thank you for the comment, and we appreciate the opportunity to clarify the appropriateness of the title. Our paper extensively investigates the transcriptional regulation of ciliary genes during spermatogenesis. It demonstrates that XAP5/XAP5L are key transcription factors involved in this process. The title reflects our primary focus on the transcriptional programs that govern ciliary gene expression. Moreover, our paper shows that XAP5 positively regulates the expression of ciliary genes, particularly during the early stages of spermatogenesis, while XAP5L negatively regulates these genes. This antagonistic relationship is a crucial aspect of the study and is effectively conveyed in the title. In addition, our revised paper provides detailed insights into how XAP5/XAP5L control ciliary gene expression during spermatogenesis.

      Figure 4C: FOXJ1 and RFX2 are absent in sperm from WT mice. Are you sure? They are highly expressed in WT testes.

      Thank you for your careful review. While FOXJ1 and RFX2 are indeed highly expressed in the testes of wild-type (WT) mice, our data show that they are not detectable in mature sperm. This observation is consistent with published single-cell RNA-seq data(Jung et al., 2019), which indicate that FOXJ1 and RFX2 are primarily expressed in spermatocytes but not in spermatids (Figure S7). This expression pattern aligns with that that of IFT-particle proteins, which are essential for the formation but not the maintenance of mammalian sperm flagella(San Agustin, Pazour, & Witman, 2015).

      XAP5 is specifically expressed in spermatogonia and XAP5-cKO mice are in meiotic arrest, indicating that XAP5 regulates meiosis rather than sperm flagella development.

      We appreciate your insightful comments. As mentioned above, we agree that XAP5 primarily regulates meiosis during spermatogenesis. When we mentioned "regulating sperm flagella development," we were referring to the spatiotemporal regulation of these processes. We have revised the manuscript to clarify this distinction. Thank you for your understanding.

      The title of Figure 2 (XAP5L is required for normal sperm formation) is not accurate because the progress of spermatogenesis and sperm count is normal in XAP5L-KO mice (only sperm motility is reduced).

      We apologize for any confusion caused by the previous figure. It did not accurately convey the changes in sperm count. In the revised Figure 2B, we clearly demonstrate that the sperm count in XAP5L-KO mice is indeed lower than that in WT mice. This revision aims to provide a more accurate representation of the effects of XAP5L deficiency on spermatogenesis. Thank you for bringing this to our attention.

      Reviewer #2 (Recommendations For The Authors):

      (1) Although XAP5 and XAP5L deficiency alters the transcription of Foxj1 and Rfx family genes, which are the essential transcription factors for the ciliogenesis, current data do not directly support that XAP5 and XAP5L are the upstream transcription factors. The authors need to show more direct evidence such as CHIP-Seq data.

      Thank you for your valuable feedback! In this revised manuscript, we have included data identifying candidate direct targets of XAP5 and XAP5L using the highly sensitive CUT&Tag method (Kaya-Okur et al., 2019). Our results show that XAP5 occupies both FOXJ1 and RFX2 (Figure 4G). Furthermore, real-time PCR validation of the CUT&Tag experiments confirmed that RFX2 is also occupied by XAP5L (Figure 4G), despite the initial CUT&Tag data not revealing enriched peaks for the RFX2 gene (Table S1). Unfortunately, the limited number of enriched peaks identified for XAP5L (Table S1) suggests that the XAP5L antibody used in the CUT&Tag experiment might have suboptimal performance, which prevented us from detecting occupancy on the FOXJ1 promoter. Nevertheless, these additional data provide strong evidence that XAP5 and XAP5L function as upstream transcription factors for FOXJ1 and RFX family genes, supporting their essential roles in ciliogenesis.

      (2) Shared transcripts that are altered by the absence of either XAP5 or XAP5L do not clearly support they are antagonistic transcription factors.

      Thank you for your insightful comment. In our revised manuscript, we performed CUT&Tag analysis to identify target genes of XAP5 and XAP5L. Motif enrichment analysis revealed conserved binding sequences for both factors (Figures S6C), indicating a subset of shared downstream genes between XAP5 and XAP5L. Among the downregulated genes in XAP5 cKO germ cells, 891 genes were bound by XAP5 (Figure S6D). Although the number of enriched peaks identified for XAP5L was limited, 75 of the upregulated genes in XAP5L KO sperm were bound by XAP5L (Figure S6E). Importantly, of these 75 XAP5L target genes, approximately 30% (22 genes) were also identified as targets of XAP5 (Figure S6G), further support the idea that XAP5 and XAP5L function as antagonistic transcription factors.

      (3) XAP5 seems to be an ancient transcription factor for cilia and flagellar assembly. However, XAP5 expresses ubiquitously in mice. How can this discrepancy be explained? Is it also required for primary cilia assembly? Are their expression also directly linked to ciliogenesis in other types of cells?

      Thank you for the thoughtful questions. The ubiquitous expression of XAP5 in mice can be understood in light of its role as an ancient transcription factor for cilia and flagellar assembly. Given that cilia are present on nearly every cell type in the mammalian body (O'Connor et al., 2013), this broad expression pattern makes sense. In fact, XAP5 serves not only as a master regulator of ciliogenesis but also as a critical regulator of various developmental processes (Kim et al., 2018; Lee et al., 2020; Xie et al., 2023).

      Our current unpublished work demonstrates that XAP5 is essential for primary cilia assembly in different cell lines. The loss of XAP5 protein results in abnormal ciliogenesis, further supporting its vital role in ciliary formation across different cell types.

      We believe that the widespread expression of XAP5 reflects its fundamental importance in multiple cellular processes, including ciliogenesis, development, and potentially other cellular functions yet to be discovered.

      (4) XAP5L causes impairs flagellar assembly. Have the authors observed any other physiological defects in the absence of XAP5L in mouse models? Such as hydrocephalus and/or tracheal defects?

      Thank you for the questions. We have carefully examined XAP5L KO mice for other physiological defects. To date, we have not observed any additional physiological abnormalities. Specifically, we assessed the condition of tracheal cilia in XAP5L KO mice and found no significant differences compared to wild-type (WT) mice, as illustrated in Author response image 1 below.

      Author response image 1.

      References

      Jung, M., Wells, D., Rusch, J., Ahmad, S., Marchini, J., Myers, S. R., & Conrad, D. F. (2019). Unified single-cell analysis of testis gene regulation and pathology in five mouse strains. Elife, 8. doi:10.7554/eLife.43966

      Kaya-Okur, H. S., Wu, S. J., Codomo, C. A., Pledger, E. S., Bryson, T. D., Henikoff, J. G., . . . Henikoff, S. (2019). CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun, 10(1), 1930. doi:10.1038/s41467-019-09982-5

      Kim, Y., Hur, S. W., Jeong, B. C., Oh, S. H., Hwang, Y. C., Kim, S. H., & Koh, J. T. (2018). The Fam50a positively regulates ameloblast differentiation via interacting with Runx2. J Cell Physiol, 233(2), 1512-1522. doi:10.1002/jcp.26038

      Lee, Y.-R., Khan, K., Armfield-Uhas, K., Srikanth, S., Thompson, N. A., Pardo, M., . . . Schwartz, C. E. (2020). Mutations in FAM50A suggest that Armfield XLID syndrome is a spliceosomopathy. Nature Communications, 11(1). doi:10.1038/s41467-020-17452-6

      O'Connor, A. K., Malarkey, E. B., Berbari, N. F., Croyle, M. J., Haycraft, C. J., Bell, P. D., . . . Yoder, B. K. (2013). An inducible CiliaGFP mouse model for in vivo visualization and analysis of cilia in live tissue. Cilia, 2(1), 8. doi:10.1186/2046-2530-2-8

      San Agustin, J. T., Pazour, G. J., & Witman, G. B. (2015). Intraflagellar transport is essential for mammalian spermiogenesis but is absent in mature sperm. Mol Biol Cell, 26(24), 4358-4372. doi:10.1091/mbc.E15-08-0578

      Xie, X., Li, L., Tao, S., Chen, M., Fei, L., Yang, Q., . . . Chen, L. (2023). Proto-Oncogene FAM50A Can Regulate the Immune Microenvironment and Development of Hepatocellular Carcinoma In Vitro and In Vivo. Int J Mol Sci, 24(4). doi:10.3390/ijms24043217

    1. eLife Assessment

      This important study employed multiple orthogonal techniques and tissue samples to investigate the interaction between the NRL transcription factor and RNA-binding proteins in the retina. The findings are convincing to support an interaction between NRL and the DHX9 helicase. The significance of the study could be enhanced with functional experiments of NRL-R-loop interactions in the developing retina and their potential role in photoreceptor health and gene regulation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Corso-Diaz et al, focus on the NRL transcription factor (TF), which is critical for retinal rod photoreceptor development and function. The authors profile NRL's protein interactome, revealing several RNA-binding proteins (RBPs) among its components. Notably, many of these RBPs are associated with R-loop biology, including DHX9 helicase, which is the primary focus of this study. R-loops are three-stranded nucleic acid structures that frequently form during transcription. The authors demonstrate that R-loop levels increase during photoreceptor maturation and establish an interaction between NRL TF and DHX9 helicase. The association between NRL and RBPs like DHX9 suggests a cooperative regulation of gene expression in a cell-type-specific manner, an intriguing discovery relevant to photoreceptor health. Since DHX9 is a key regulator of R-loop homeostasis, the study proposes a potential mechanism where a cell-type-specific TF controls the expression of certain genes by modulating R-loop homeostasis. The authors also identify another R-loop resolvase, DDX5 as having weaker interaction with NRL, perhaps due to indirect mechanism.

      This is a very interesting study providing genome-wide mapping of R-loops in mammalian retina, which shows an enrichment of R-loops over intergenic regions as well as genes encoding neuronal function factors. The R-loop-enriched genes are longer than genes without R-loops, which supports previous findings from studies in neuronal cells. This is a very relevant study highlighting the possible mechanism of gene expression regulation via interactions between TFs, RNA binding proteins, and R-loops. In that regard, it would be very interesting to uncover the biological relevance of such gene regulation. The authors provide adequate evidence of interaction between R-loops and NRL TF via in vitro IP assay and genomic co-localization, however, this interaction can be mediated via multiple R-loop or RNA-binding proteins. Thus, follow-up studies would be appropriate to characterize this interaction in more detail.

    3. Reviewer #2 (Public review):

      Summary:

      The Authors utilize biochemical approaches to determine and validate NRL protein-protein interactions to further understand the mechanisms by which the NRL transcription factor controls rod photoreceptor gene regulatory networks. Observations that NRL displays numerous protein-protein interactions with RNA-binding proteins, many of which are involved in R-loop biology, led the authors to investigate the role of RNA and R-loops in mediating protein-protein interactions and profile the co-localization of R-loops with NRL genomic occupancy.

      Strengths:

      Overall, the manuscript is well-written, providing succinct explanations of the observed results and potential implications. Additionally, the Authors use multiple orthogonal techniques and tissue samples to reproduce and validate that NRL interacts with DHX9 and DDX5. Experiments also utilize specific assays to understand the influence of RNA and R-loops on protein-protein interactions. The Authors also use state-of-the-art techniques to profile R-loop localization within the retina and integrate multiple previously established datasets to correlate R-loop presence with transcription factor binding and chromatin marks an attempt to understand the significance of R-loops in the retina.

      Weaknesses:

      In general, the Authors provide interpretations of the data that fit a narrative about NRL and the perceived significance of interactions with RNA binding proteins. Large-scale screens for NRL protein interactions were conducted but all of the data is not reported. For example, NRL IP-Mass Spec was performed, but the authors only provide interaction/detection data for identified interactions with known RNA binding proteins. We cannot assess the enrichment of interactions or specificity of interactions with RNA binding proteins based on the reported results. Additionally, the lack of experiments testing the functional significance of Nrl interactions with R-loops within the developing retina fails to provide novel biological insights into the regulation of gene regulatory networks. While this provides additional avenues for research in the future, it is unclear that NRL interaction with R-loops have physiological relevance for photoreceptor health or function.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Corso-Diaz et al, focus on the NRL transcription factor (TF), which is critical for retinal rod photoreceptor development and function. The authors profile NRL's protein interactome, revealing several RNA-binding proteins (RBPs) among its components. Notably, many of these RBPs are associated with R-loop biology, including DHX9 helicase, which is the primary focus of this study. R-loops are three-stranded nucleic acid structures that frequently form during transcription. The authors demonstrate that R-loop levels increase during photoreceptor maturation and establish an interaction between NRL TF and DHX9 helicase. The association between NRL and RBPs like DHX9 suggests a cooperative regulation of gene expression in a cell-type-specific manner, an intriguing discovery relevant to photoreceptor health. Since DHX9 is a key regulator of R-loop homeostasis, the study proposes a potential mechanism where a cell-type-specific TF controls the expression of certain genes by modulating R-loop homeostasis. This study also presents the first data on R-loop mapping in mammalian retinas and shows the enrichment of R-loops over intergenic regions as well as genes encoding neuronal function factors. While the research topic is very important, there is some concern regarding the data presented: there are substantial data supporting the interaction between NRL and DHX9, including pull-down experiments and proximity labeling assay (PLA), however, the data showing an interaction between NRL and DDX5, another R-loop-associated helicase, are inadequate. Importantly, the data supporting the claim that NRL interacts with R-loops are absolutely insufficient and at best, correlative. The next concerns are regarding the R-loop mapping data analysis and visualization.

      Strengths:

      There is compelling evidence that the NRL transcription factor interacts with several RNA binding proteins, and specifically, sufficient data supporting the interaction of NRL with DHX9 helicase.

      A major strength is the use of the single-stranded R-loop mapping method in the mouse retina.

      Weaknesses:

      (1) Figure S1A: There is a strong band in GST-IP (control IP) for either HNRNPUI1 or HNRNPU, although the authors state in their results that there is a strong interaction of these two RBPs with NRL.

      Under our experimental conditions, most RNA-binding proteins displayed higher binding to glutathione beads (Fig. S1A). However, GST-NRL purifications showed much stronger signals for respective RBPs. In the case of HNRNPU and HNRNPUl1, white bands that are indicative of substrate depletion due to higher protein levels are observed in GST-NRL lanes. Additionally, in Figures 1B and 1C, there is a clear enrichment of HNRNPU and HNRNPUl1 above the background signal. We added this to the text. See page 5.

      Both DHX9 and DDX5 samples have a faint band in the GST-IP.

      RNA-binding proteins may display some background as observed in other studies (e.g. PMID: 32704541). We think that showing the raw data without decreasing the exposure time is useful and that there is a clear enrichment compared to controls.  In addition, we tested the interaction in multiple systems.

      There is an extremely faint band for HNRNPA2B1 in the GST-NRL IP lane. Given this is a pull-down with added benzonase treatment to remove all nucleic acids, these data suggest, that previously observed NRL interactions with these particular RBPs are mediated via nucleic acids. Similarly, there is a loss of band signal for HNRNM in this assay, although it was identified as an NRL-interacting protein in three assays, which again suggests that nucleic acids mediate the interaction.

      Thank you for highlighting this point. We mention in the manuscript that the interaction between HNRNPM and A1 depends on nucleic acids, as noted by the reviewer, since there is no obvious band after the pull-down. We have now added that the interaction of NRL with HNRNPA1B1 is likely dependent on nucleic acids as well, given its weak signal. See page 5.

      (2) The data supporting NRL-DDX5 interaction in rod photoreceptor nuclei is very weak. In Figure 2D, the PLA signal for DDX5-NRL is very weak in the adult mouse retina and is absent in the human retina, as shown in Figure 2H.

      We agree with the reviewer. We think that the signal for DDX5 is weak, and we addressed this in the text. We noted on page 7: “Taken together, these findings suggest a strong interaction between NRL and DHX9 throughout the nuclear compartment in the retina and that a transient and/or more regulated interaction of NRL with DDX5 may require additional protein partners.”  We have modified this sentence to add that the data also suggest transient interaction or the requirement of additional protein partners for stable interaction. See page 7.

      Given that there is no NRL-KO available for the human PLA assay, the control experiments using single-protein antibodies should be included in the assay. Similarly, the single-protein antibody control PLA experiments should be included in the experimental data presented in Figure 2J.

      Thank you for the suggestion. We performed PLAs using both DHX9 and IgG in the human retina and observed no specific amplification signal. Some background is observed outside the nucleus and in the extracellular space. We added these results to the text and to the supplementary information. See page 7 and Fig.S2B.

      (3) The EMSA experiment using a probe containing NRL binding motif within the DHX9 promoter should include incubation with retina nuclear extracts depleted for NRL as a control.

      In EMSA experiments, we used bovine retina to obtain enough protein quantities. As suggested by the reviewer, using NRL depleted extract would increase the specificity of observed gel shift and complement our pre-immune serum as a negative control. However, removal of all the NRL protein using the antibodies available was not feasible. In the future, we will use enough mice to obtain large quantities of protein for this experiment and will collect retinas from Nrl knockout as negative control.

      (4) There is a reduced amount of DHX9 pulled down in NRL-IP in HEK293 cells, but there is no statistically significant difference in the reciprocal IP (DHX9-IP and blotting for NRL) (Figure 4C).

      We believe the reviewer is referring to the data in Figure 4C showing that RNase H treatment led to significantly reduced pulldown of DHX9 as compared to control, but the reciprocal IP in Figure 4D showed no statistical significance between control and RNase H treatment. In Figure 4D, we hypothesize that NRL may account for only a small proportion of DHX9’s interactome, so the change in NRL levels could not be detected due to the sensitivity of our assay. DHX9 likely constitutes a large proportion of NRL’s interactome in HEK293 cells, hence the change in DHX9 level was more obvious when pulling down with NRL. We added this information to the results. See page 8.

      (5) The only data supporting the claim that NRL interacts with R-loops are presented in Figure 5A.

      Additional evidence that NRL interacts with R-loops comes from DRIP-Seq experiments where signals from R-loops overlap with NRL ChIP-Seq signals (Figure 7A). This shows that R-loops and NRL co-occur on multiple genomic regions. In addition, indirect evidence of NRL and R-loops’ interaction is shown in pull down experiments and PLA assays where R-loops influence DHX9 and NRL binding. We clarified this in the discussion. See page 14.

      This is a co-IP of R-loops and then blotting for NRL, DHX9, and DDX5. Here, there is no signal for DDX5, quantification of DHX9 signal shows no statistically significant difference between RNase H treated and untreated samples, while NRL shows a signal in RNase H treated sample. These data are not sufficient to make the statement regarding the interaction of NRL with R-loops.

      Thank you for this comment. We respectfully disagree as we observe statistically significant enrichment for both NRL and DHX9 in these experiments (See Fig5A). Some NRL continues to bind to DNA that is pulled down nonspecifically, which may be expected since NRL is a transcription factor. See for example R-loop binding by the transcription factor Sox2 (PMID: 32704541). However, binding to R-loops is evidenced by an enrichment compared to RNase H-treated sample. We clarified this in Results section (See page 9).

      (6) Regarding R-loop mapping, the data analysis is quite confusing. The authors perform two different types of analyses: either overall narrow and broad peak analysis or strand-specific analysis. Given that the authors used ssDRIP-seq, which is a method designed to map R-loops strand specifically, it is confusing to perform different types of analyses.

      Thank you for highlighting this point. This has enhanced the clarity of the methods and enriched the discussion. We aimed to identify R-loops as accurately as possible. We conducted two types of analyses to capture different aspects of R-loops: one that looks at overall patterns (narrow and broad peaks) and another that focuses on specific strands of DNA.

      Using ssDRIP-seq, which is designed to map R-loops on specific strands, allowed us to examine R-loops formed in only one strand and those formed on both strands. To identify strand-specific R-loops, we filtered our RNase-H enriched peaks for those enriched on one strand compared to the opposite strand. We clarified the analysis in the results section, and Figure 6B. See page 10 and methods section page 25.

      Next, the peak analysis is usually performed based on the RNase H treated R-loop mapping; what does it mean then to have a pool of "Not R-loops", see Figure 6B?

      The “Not R-loop” group refers to peaks called using the opposite strand that are not observed when calling peaks using RNase H as control. We modified this figure for clarity (Figure 6B).

      In that regard, what does the term "unstranded" R-loops mean? Based on the authors' definition, these are R-loops that do not fall within the group of strand-specific R-loops. The authors should explain the reasons behind these types of analyses and explain, what the biological relevance of these different types of R-loops is.

      Thank you for helping us clarify this point. Unstranded R-loops are DNA regions containing DNA:RNA hybrids on both plus and minus strands and possibly representing bidirectional transcription by Pol II. We observed that unstranded R-loops are enriched only in intergenic regions, H3K9me3 regions, and downstream of the transcriptional termination site (TTS). We added to the discussion the possible implications of these enrichments, including regulation of Pol II termination and transcription of long genes.  See Page 13.

      (7) It would be more useful to show the percent distribution of R-loops over the different genomic regions, instead of showing p-value enrichment, see Figure 6C.

      Since most of the genome is non-coding, plotting the distribution as a proportion was not informative since the vast proportion of the data falls in intergenic regions. However, we created a new figure showing observed vs. expected ratio that seems to be more informative and moved the current p-value figure to the supplement in revised version. See Figure 6C and S6D.

      (8) Based on the model presented, NRL regulates R-loop biology via interaction with RBPs, such as DHX9, a known R-loop resolution helicase. Given that the gene targets of NRL TF are known, it would be useful to then analyze the R-loop mapping data across this gene set.

      Thank you for this suggestion. We performed an analysis of R-loops on NRL-regulated genes. Interestingly, NRL target genes have an enrichment of stranded R-loops at the promoter/TSS and unstranded R-loops on the gene body compared to all Ensembl genes (Figure S7B). We added a table containing all NRL-regulated genes we used for this analysis (table S5) and a figure showing this result (Fig. S7B).

      Reviewer #2 (Public review):

      Summary:

      The authors utilize biochemical approaches to determine and validate NRL protein-protein interactions to further understand the mechanisms by which the NRL transcription factor controls rod photoreceptor gene regulatory networks. Observations that NRL displays numerous protein-protein interactions with RNA-binding proteins, many of which are involved in R-loop biology, led the authors to investigate the role of RNA and R-loops in mediating protein-protein interactions and profile the co-localization of R-loops with NRL genomic occupancy.

      Strengths:

      Overall, the manuscript is very well written, providing succinct explanations of the observed results and potential implications. Additionally, the authors use multiple orthogonal techniques and tissue samples to reproduce and validate that NRL interacts with DHX9 and DDX5. Experiments also utilize specific assays to understand the influence of RNA and R-loops on protein-protein interactions. The authors also use state-of-the-art techniques to profile R-loop localization within the retina and integrate multiple previously established datasets to correlate R-loop presence with transcription factor binding and chromatin marks in an attempt to understand the significance of R-loops in the retina.

      Weaknesses:

      In general, the authors provide superficial interpretations of the data that fit a narrative but fail to provide alternative explanations or address caveats of the results. Specifically, many bands are present in interaction studies either in control lanes (GST controls) of Westerns or large amounts of background in PLA experiments.

      We have added additional information to the text regarding the presence of background signals in pull downs. We wish to note that experimental samples always exceeded background signals.  We believe that reporting these raw findings (rather than showing shorter exposures) is valuable for the scientific community. We did not observe any background in the proximity ligation assay (PLA) that exceeded what is typically expected, and the signals were clearly discernible. Cases where signals are weaker, such as with DDX5, have been highlighted. In addition, we added a DHX9-IgG negative control for the human PLA experiment. See page 5 and Fig. S2B.

      Additionally, the lack of experiments testing the functional significance of Nrl interactions or R-loops within the developing retina fails to provide novel biological insights into the regulation of gene regulatory networks other than, 'This could be a potentially important new mechanism'.

      We agree that functional experiments are necessary to understand the molecular mechanisms behind R-loop regulation in the retina; however, we believe it goes beyond the scope of this initial characterization (as this is the first report on R-loops in the retina). We are currently pursuing these studies.

      We performed new analysis on NRL-regulated genes as suggested by reviewer 1. We show that NRL target genes have an enrichment of stranded R-loops at the promoter/TSS and unstranded R-loops on the gene body compared to all Ensembl genes (Figure S7B), providing further evidence of the functional  interaction between NRL and R-loops. See table S5 and Fig. S7B, and discussion.

      Additionally, the authors test the necessity of RNA for NRL/DHX9 interactions but don't show RNA binding of NRL or DHX9 or the sufficiency of RNA to interfere/mediate protein-protein interactions. Recent work has highlighted the prevalence of RNA binding by transcription factors through Arginine Rich Motifs that are located near the DNA binding domains of transcription factors.

      We agree that the role of RNA in these complexes is very exciting, and we are currently pursuing these studies. However, we believe that they fall outside the scope of this initial report on R-loops in the retina.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a couple of minor comments:

      (1) Unfinished sentence; page 11, the end of the first paragraph.

      Thank you for catching this error. We removed the unfinished text.

      (2) Page 6: Figure S2A should be Figure S2.

      In general, the manuscript would benefit from a deeper explanation of the biological relevance of R-loop formation and the connection to NRL TF and the expression of genes regulated by NRL. In this regard, a more substantial description of the model would be useful.

      We have modified the discussion for clarity and included new ideas on possible roles of R-loops in gene regulation of photoreceptors.

      Reviewer #2 (Recommendations for the authors):

      (1) The specificity of interactions needs to be addressed:

      - Figure 1B - HNRNPUI1 bands present in GST control.

      - Figure 1C - Bands present in the Empty Vector control IP for HNRNPU and DHX9.

      - Supplemental Figure 1A - most proteins are present in GST control suggesting prevalent binding to GST and lack of specificity for other interactions.

      Thank you for your comment. RNA-binding proteins can have more background as observed in other studies (e.g. PMID: 32704541) but there is always a higher signal in experimental samples compared to controls. While we agree that we can enhance the conditions for immunoprecipitation (IP) by optimizing washing buffers, exposure and other parameters, we believe the current methods tell the story. We have added additional text explaining this. See page 5.

      (2) Use of the term 'Strongest' interaction - IPs don't directly address the strength of interaction, but depend on levels of expression AND affinity. The strength of interaction should be tested using techniques like an OCTET or SPR assay. One can also quantify the effect that RNA would have in such an assay.

      Thank you for your suggestion. We replaced the term 'stronger' with “higher signal” and “robust” at most places. The source of protein lysates is the same for experiments and controls, thus the amount of protein is consistent in both conditions, and not dependent on level of gene expression.

      (3) In supplemental tables, please use the proper gene names, not the UniProt peptide name. For example, there are no genes named ELAV1-ELAV4. These should be ELAVL1-ELAVL4. A short glance identifies >10 gene name errors.

      Thank you for the suggestion. We updated current gene names in all tables.

      (4) Please provide the rationale for the choice of DNA sequence for the DHX9 nucleotide sequence used for EMSA assays. In the human DHX9 locus, the NRL ChIP-seq peak looks to be contained in Intron1 whereas the NRL ChIP-seq peak in mouse DHX9 looks to be in the proximal upstream promoter. Did the authors choose an evolutionarily conserved sequence in the promoter region that contained the NRL motif or does the probe sequence arise from the sequence that has known NRL binding as assayed by NRL ChIP-seq? A zoomed-in image of the NRL ChIP-seq pile-ups in the DHX9 locus in each species would be beneficial.

      Thank you for this suggestion. The probe was chosen by scanning for NRL binding motifs on the Chip-Seq peak at the human DHX9 promoter. We added a Zoom-in image of the ChIP-Seq or CUT&RUN reads for NRL on both human and mouse retinas. Figure 3D shows NRL binding in both species in regions containing the homologous motif. The sequence is partially conserved and shown in the figure.

      (5) Normalization in RNaseH/RNaseA Co-IP experiments. Why does RNAseH treatment result in increased NRL IP (increased NRL expression?) or does RNaseA treatment cause reduced IP of DHX9? These differences seem to cause a 'denominator' effect, leading the Authors to conclude decreased co-IP of DHX9 with NRL when R-loops are inhibited or increased co-IP of NRL with DHX9 when RNA is degraded. An alternate interpretation would be that inhibiting the R-loop binding of NRL unmasks the epitope for antibody recognition. The authors should test NRL binding to RNA and determine if RNA binding affects the co-IP of NRL with DHX9.

      We agree that removing total RNA by RNase A or R-loops by RNase H may alter the accessibility of our antibodies to the epitopes, resulting in the differences in the level of total protein pulled down. However, we quantified the relative level of the associating protein to the total protein and confirmed, in reciprocal assays, that RNase A treatment led to increased interaction between NRL and DHX9. However, the quantification was not consistent between the reciprocal IPs upon RNase H treatment. We reason that in Figure 4D, as NRL may account for only a small proportion of DHX9’s interactome, the change in NRL level could not be detected due to the sensitivity of our assay. However reciprocally, DHX9 can constitute a larger proportion of NRL’s interactome in HEK293 cells, hence the change in DHX9 level was more obvious. We added this information to the text. See page 8.

      (6) Figure 7 - Malat1 - there doesn't seem to be an overlap of NRL with Stranded R-loop peaks in this image. Nrl seems to flank the region of R-loops.

      We changed Malat1 for Mplkip that shows a direct overlap of Nrl binding and R-loops. See Figure 7C.

      (7) Results end with 'A Model'. Seems like some concluding remarks and references to Figure 8 were mistakenly left out.

      Thank you for catching this typo. We removed the misplaced text.

      (8) Model and Discussion - authors should show raw data for RHO with respect to NRL binding and R-loops. No evidence was provided regarding R-loops (or lack thereof) in the Rhodopsin locus. Additionally, conclusions stating that "R-loops... are specifically depleted from genes, such as Rhodopsin, with high expression levels" go against Figures 7B and 7C. Malat1 is one of the highest expressed genes in the retina and contains R-loops.

      Thank you for helping us clarify our hypothesis. We added a genome browser view of Rhodopsin showing the absence of R-loops (Fig. S8). We hypothesize that R-loops could interfere with achieving higher rates of transcription, however we did not mean to say that all high expressed genes lack R-loops. We have rephrased the discussion to clarify this point.

      (9) Neuronal genes, particularly those involved in synaptic transmission are known to be, on average, longer than most genes (Gabel, 2015; PMID: 25762136). Is it possible that R-loops are detected at genes involved in synaptic function/structure solely because of transcript length, as it takes longer for transcription termination to resolve in genes that are longer? A plot showing R-loop enrichment and transcript length would address this.

      We added a plot showing gene length in relation to R-loops and expression levels. We observed that R-loops are more common over long genes regardless of their expression levels. We also observed that the concomitant presence of stranded and unstranded R-loops is restricted to the longest genes in most cases. We added this to Figure 7D.

    1. eLife Assessment

      This study is a valuable contribution to the field of neuronal modeling by way of providing a method for rapidly obtaining neuronal physiology parameters from electrophysiological recordings. While the approach seems promising, in its current form it is incomplete since the generated models often diverge from the data and the comparison with existing methods has concerns.

    2. Reviewer #1 (Public review):

      The work by Kim et al. shows that a parameter generator for biophysical HH-like models can be trained through a GAN-based approach, to reproduce experimentally measured voltage responses and IV curves.<br /> A particularly interesting aspects of this generator is that, once it has been learned, it can be applied to new recordings to generate appropriate parameter sets at a low computational cost, a feature missing from more commonplace evolutionary approaches.

      I appreciate the changes the authors have made to the manuscript. The authors have clarified their inverse gradient method. They also provide a better validation and a rich set of ablations. However, I still have major concerns that should be addressed.

      Major concerns:

      (1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

      (2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

      (3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

    3. Reviewer #2 (Public review):

      Summary:

      Generating biophysically detailed computational models that capture the characteristic physiological properties of biological neurons for diverse cell types is an important and difficult problem in computational neuroscience. One major challenge lies in determining the large number of parameters of such models, which are notoriously difficult to fit to experimental data. Thereby, the computational and energy costs can be significant. The study 'ElectroPhysiomeGAN: Generation of Biophysical Neuron Model Parameters from Recorded Electrophysiological Responses' by Kim et al. describes a computationally efficient approach for predicting model parameters of Hodgkin-Huxley neuron models using Generative Adversarial Networks (GANs) trained on simulation data. The method is applied to generate models for 9 non-spiking neurons in C. elegans based on electrophysiological recordings. While the generated models capture the responses of these neurons to some degree, they generally show significant deviations from the empirically observed responses in important features. Although EP-GAN shows clear benefits under limited compute, the results do not yet demonstrate the quality needed to match other state-of-the-art methods. Future work examining extended training, larger datasets, or hybrid approaches would help clarify whether EP-GAN can generate models of high quality. If so, this would indeed be a major step forward; if not, the computationally more expensive methods will remain essential.

      Strengths:

      The authors work on an important and difficult problem. A noteworthy strength of their approach is that once trained, the GANs can generate models from new empirical data with very little computational effort. The generated models reproduce the response to current injections reasonably well.

      Weaknesses:

      Major 1: Models do not faithfully capture empirical responses. While the models generated with EP-GAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EP-GAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

      Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

      Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

      Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for valuable feedback and comments. Based on the feedback we revised the manuscript and believe that we addressed most of the reviewers' raised points. Below we include a summary of key revisions and point-by-point responses to reviewers comments.

      Abstract/Introduction

      We further emphasized EP-GAN strength in parameter inference of detailed neuron parameters vs specialized models with reduced parameters.

      Results

      We further elaborated on the method of training EP-GAN on synthetic neurons and validating on both synthetic and experimental neurons.

      We added a new section Statistical Analysis and Loss Extension which includes:

      - Statistical evaluation of baseline EP-GAN and other methods on neurons with multi recording membrane potential responses/steady-state currents data: AWB, URX, HSN

      - Evaluation of EP-GAN with added resting potential loss + longer simulations to ensure stability of membrane potential (EP-GAN-E)

      Methods

      We added a detailed explanation on "inverse gradient process"

      We added detailed current/voltage-clamp protocols for both synthetic and experimental validation and prediction scenarios (table 6)

      Supplementary

      We added error distribution and representative samples for synthetic neuron validations (Fig S1)

      We added membrane potential response statistical analysis plots for existing methods for AWB, URX, HSN (Fig S6)

      We added steady-state currents statistical analysis plots on EP-GAN + existing methods for AWB, URX, HSN (Fig S7)

      We added mean membrane potential errors for AWB, URX, HSN normalized by empirical standard deviations for all methods (Table S4)

      Please see our point-by-point responses to specific feedback and comment below.

      Reviewer 1:

      First, at the methodological level, the authors should explain the inverse gradient operation in more detail, as the reconstructed voltage will not only depend on the evaluation of the right-hand side of the HH-equations, as they write but also on the initial state of the system. Why did the authors not simply simulate the responses?

      We thank the reviewer for the feedback regarding the need for further explanation. We have revised the Methods section to provide a more detailed description of the inverse gradient process. The process uses a discrete integration method, similar to Euler’s formula, which takes systems’ initial conditions into account. For the EP-GAN baseline, the initial states were picked soon after the start of the stimulus to reconstruct the voltage during the stimulation period. For EP-GAN with extended loss (EP-GAN-E), introduced in this revision in sub-section Statistical Analysis and Loss Extension, initial states before/after stimulations were also taken into account to incorporate resting voltage states into target loss.

      Since EP-GAN is a neural network and we want the inverse gradient process to be part of the training process (i.e., making EP-GAN a “model informed network”), the process is expected to be implemented as a differentiable function of generated parameter p. This enables the derivatives from reconstructed voltages to be traced back to all network components via back-propagation algorithm.

      Computationally, this requires the implementation of the process as a combination of discrete array operations with “auto-differentiation”, which allows automatic computation of derivatives for each operation. While explicit simulation of the responses using ODE solvers provides more accurate solutions, the algorithms used by these solvers typically do not support such specialized arrays nor are they compatible with neural network training. We thus utilized PyTorch tensors [54], which support both auto-differentiation and vectorization to implement the process.

      The authors did not allow the models time to equilibrate before starting their reconstruction simulations, as testified by the large transients observed before stimulation onset in their plots. To get a sense of whether the models reproduce the equilibria of the measured responses to a reasonable degree, the authors should allow sufficient time for the models to equilibrate before starting their stimulation protocol.

      In the added Statistical Analysis and Loss Extension under the Results section, we added results for EP-GAN-E where we simulate the voltage responses with 5 seconds of added stabilization period in the beginning of simulations. The added period mitigates voltage fluctuations observed during the initial simulation phase and we observe that simulated voltage responses indeed reach stable equilibrium for both prior stimulations and for the zero stimulus current-clamp protocol (Figure 5 bottom, Column 3).

      In fact, why did the authors not explicitly include the equilibrium voltage as a target loss in their set of loss functions? This would be an important quantity that determines the opening level of all the ion channels and therefore would influence the associated parameter values.

      EP-GAN baseline does include equilibrium voltage as a target loss since all current-clamp protocols used in the study (both synthetic and experimental) include a membrane potential trace where the stimulus amplitude is zero throughout the entire recording duration (see added Table 6 for current clamp protocols), thus enforcing EP-GAN to optimize resting membrane potential alongside with other non-zero stimulus current-clamp scenarios.

      To further study EP-GAN’s accuracy in resting potential, we evaluated EP-GAN with supplemental resting potential target loss and evaluated its performance in the sub-section Statistical Analysis and Loss Extension. The added loss, combined with 5 seconds of additional stabilization period, improved accuracy in predicting resting potentials by mitigating voltage fluctuations during the early simulation phase and made significant improvements to predicting AWB membrane potential responses where EP-GAN baseline resulted in overshoot of the resting potential.

      The authors should provide a more detailed evaluation of the models. They should explicitly provide the IV curves (this should be easy enough, as they compute them anyway), and clearly describe the time-point at which they compute them, as their current figures suggest there might be strong transient changes in them.

      We included predicted IV-curve vs ground truth plots in addition to the voltages in the supplementary materials (Figure S2, S5) in the original submitted version of the manuscript. In this revision, we added additional IV-curve plots with statistical analysis for the neurons with multi-recording data (AWB, URX, HSN) in the supplementary materials (Figure S7).

      For the evaluation of predicted membrane potential responses, we added further details in Validation Scenarios (Synthetic) under Results section such that it clearly explains on the current-clamp protocols used for both synthetic and experimental neurons and which time interval the RMSE evaluations were performed.

      In the sub-section Statistical Analysis and Loss Extension, we introduced a new statistical metric in addition to RMSE, applied for neurons AWB, URX, HSN which evaluates the percentage of predicted voltages that fall within the empirical range (i.e., mean +- 2 std) and voltage error normalized by empirical standard deviations (Table S4).

      The authors should assess the stability of the models. Some of the models exhibit responses that look as if they might be unstable if simulated for sufficiently long periods of time. Therefore, the authors should investigate whether all obtained parameter sets lead to stable models.

      In the sub-section Statistical Analysis and Loss Extension, we included individual voltage traces generated by both EP-GAN baseline and EP-GAN-E (extended) with longer simulation (+5 seconds) to ensure stability. EP-GAN-E is able to produce equilibrium voltages that are indeed stable and within empirical bounds throughout the simulations for the zero-stimulus current-clamp scenario (column 3) for the 3 tested neurons (AWB, URX, HSN).

      Minor:

      The authors should provide a description of the model, and it's trainable parameters. At the moment, it is unclear which parameter of the ion channels are actually trained by the methodology.

      The detailed description of the model and its ion channels can be found in [7]. Supplementary materials also include an excel table predicted parameters which lists all EP-GAN fitted parameters for 9 neurons (+3 new parameter sets for AWB, URX, HSN using EP-GAN-E) included in the study, the labels for trainability, and their respective lower/upper bounds used during training data generation. In the revised manuscript, we further elaborated on the above information in the second paragraph of the Results section.

      Reviewer 2:

      Major 1: While the models generated with EP-GAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. While these deviations may appear small in the Root mean Square Error (RMSE), the only metric used in the study to assess the quality of the models, they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron.

      EP-GAN main contribution is targeted towards parameter inference of detailed neuron model parameters, in a compute efficient manner. This is a difficult problem to address even with current state-of-the-art fitting algorithms. While EP-GAN is not perfect in capturing the dynamics of the responses and RMSE does not fully reflect the quality of predicted electrophysiological properties, it’s a generic error metric for time series that is easily interpretable and applicable for all methods. Using such a metric, our studies show that EP-GAN overall prediction quality exceeds those of existing methods when given identical optimization goals in a compute normalized setup.

      In our revised manuscript, we included a new section Statistical Analysis and Loss Extension under Results section where we performed additional statistical evaluations (e.g., % of predicted responses within empirical range) of EP-GAN’s predictions for neurons with multi recording data. The results show that predicted voltage responses from EP-GAN baseline (introduced in original manuscript) are in general, within the empirical range with ~80% of its responses falling within +- 2 empirical standard deviations, which were higher than existing methods: DEMO (57.9%), GDE3 (37.9%), NSDE (38%), NSGA2 (60.2%).

      Major 2: Other metrics than the RMSE should be incorporated to validate simulated responses against electrophysiological data. A common approach is to extract multiple biologically meaningful features from the voltage traces before, during and after the stimulus, and compare the simulated responses to the experimentally observed distribution of these features. Typically, a model is only accepted if all features fall within the empirically observed ranges (see e.g. https://doi.org/10.1371/journal.pcbi.1002107). However, based on the deviations in resting membrane potential and the return to the resting membrane potential alone, most if not all the models shown in this study would not be accepted.

      In our original manuscript, due to all of our neurons’ recordings having a single set of recording data, RMSE was chosen to be the most generic and interpretable error metric. We conducted additional electrophysiological recordings for 3 neurons in prediction scenarios (AWB, URX, HSN) and performed statistical analysis of generated models in the sub-section Statistical Analysis and Loss Extension. Specifically, we evaluated the percentage of predicted voltage responses that fall within the empirical range (empirical mean +- 2 std, p ~ 0.05) that encompass the responses before, during and after stimulus (Figure 5, Table 5) and mean membrane potential error normalized by empirical standard deviations (Table S4).

      The results show that EP-GAN baseline achieves average of ~80% of its predicted responses falling within the empirical range, which is higher than the other methods: DEMO (57.9%), GDE3 (37.9%), NSDE (38%), NSGA2 (60.2%). Supplementing EP-GAN with additional resting potential loss (EPGAN-E) increased the percentage to ~85% with noticeable improvements in reproducing dynamical features for AWB (Figure 5). Evaluations of membrane potential errors normalized by empirical standard deviations also showed similar results where EP-GAN baseline and EP-GAN-E have average error of 1.0 std and 0.7 std respectively, outperforming DEMO (1.7 std), GDE3 (2.0 std), NSDE (3.0 std) and NSGA (1.5 std) (Table S4).

      Major 3: Abstract and introduction imply that the 'ElectroPhysiome' refers to models that incorporate both the connectome and individual neuron physiology. However, the work presented in this study does not make use of any connectomics data. To make the claim that ElectroPhysiomeGAN can jointly capture both 'network interaction and cellular dynamics', the generated models would need to be evaluated for network inputs, for example by exposing them to naturalistic stimuli of synaptic inputs. It seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations.

      In the paper, EP-GAN is introduced as a parameter estimation method that can aid the development of ElectroPhysiome, which is a network model - these are two different method types and we do not claim EP-GAN is a model that can capture network dynamics. To avoid possible confusion, we made further clarifications in the abstract/introduction that EP-GAN is a machine learning approach for neuron HH-parameter estimation.

      I find it hard to believe that the methods EP-GAN is compared to could not perform any better. For example, multi-objective optimization algorithms are often successful in generating models that match empirical observations very well, but features used as target of the optimization need to be carefully selected for the optimization to succeed. Likely, each method requires extensive trial and error to achieve the best performance for a given problem. It is therefore hard to do a fair comparison. Given these complications, I would like to encourage the authors to rethink the framing of the story as a benchmark of EP-GAN vs. other methods. Also, the number of parameters does not seem that relevant to me, as long as the resulting models faithfully reproduce empirical data. What I find most interesting is that EP-GAN learns general relationships between electrophysiological responses and biophysical parameters, and likely could also be used to inspect the distribution of parameters that are consistent with a given empirical observation.

      We thank the reviewer for providing this perspective. While it is indeed difficult to have a completely fair comparison between existing optimization methods vs EP-GAN due to the fundamental differences in their algorithms, we believe that the current comparisons with other methods are justified as they provide baseline performance metrics to test EP-GAN for its intended use cases.

      The main strength of EP-GAN, as previously mentioned, is in its ability to efficiently navigate large detailed HH-models with many parameters so that it can aid in the development of nervous system models such as ElectroPhysiome, potentially fitting hundreds of neurons in a time efficient manner.

      While EP-GAN’s ability to learn the general relationship between electrophysiological responses and parameter distribution are indeed interesting and warrant a more careful examination, this is not the main focus of the paper since in this work we focus on introducing EP-GAN as a methodology for parameter inference.

      In this context, we believe the comparisons with other methods conducted in a compute normalized manner (i.e., each method is given the same # of simulations) and identical optimization targets provides an adequate framework for evaluating the aforementioned EP-GAN aim. Indeed, while EPGAN excels with larger HH-models, it performs slightly worse than DE for smaller models such as the one used by [16] despite it being more compute efficient (Table S2).

      To emphasize the EP-GAN aim, we revised the main manuscript description to focus on its intended use in parameter inference of detailed neuron parameters vs specialized models with reduced parameters.

      I could not find important aspects of the methods. What are the 176 parameters that were targeted as trainable parameters? What are the parameter bounds? What are the remaining parameters that have been excluded? What are the Hodgkin-Huxley models used? Which channels do they represent? What are the stimulus protocols?

      The detailed description and development of the HH-model that we use and its ion channel list can be found in [7]. Supplementary materials also include an excel table predicted parameters which lists all EP-GAN fitted parameters for 9 neurons (+3 new parameter sets for AWB, URX, HSN using EPGAN-E), the labels for trainability, and parameter bounds used for parameters during the generation of training data.

      We also added a new Table which details the current/voltage clamp protocols used for 9 neurons including the ones used for evaluating EP-GAN-E, which was supplemented with longer simulation time to ensure voltage stability (please see Table 6).

      I could not assess the validation of the EP-GAN by modeling 200 synthetic neurons based on the data presented in the manuscript since the only reported metric is the RMSE (5.84mV and 5.81mV for neurons sampled from training data and testing data respectively) averaged over all 200 synthetic neurons. Please report the distribution of RMSEs, include other biologically more relevant metrics, and show representative examples. The responses should be carefully investigated for the types of mismatches that occur, and their biological relevance should be discussed. For example, is the EP-GAN biased to generate responses with certain characteristics, like the 'overshoot' discussed in Major 1? Is it generally poor at fitting the resting potential?

      We thank the reviewer for the feedback regarding the need for additional supporting data for synthetic neuron validations. In the revised supplementary materials Figure S1, we included the distribution of RMSE errors for both groups of synthetic neuron validations (validation/test set) and representative samples for both EP-GAN baseline and EP-GAN-E. Notably, the inaccuracies observed during the experimental neuron predictions (e.g., resting potential, voltage overshoot) do not necessarily generalize to synthetic neurons, indicating that such mismatches could stem from the differences between synthetic neurons used for training and experimental neurons for predictions. While synthetic neurons are generated according to empirically determined parameter bounds, some experimental neuron types are rarer than the others and may also involve other channels that have not been recorded or modeled in [7], which can affect the quality of predicted parameters (see 2nd and 4th paragraphs of Discussions section for more detail). Also, properties such as recording error/noise that are often present in experimental neurons are not fully accounted for in synthetic neurons.

      To further study how these mismatches can be mitigated, in the revision we added an extended version of EP-GAN where target loss was supplemented with additional resting potential and 5 seconds of stabilization period during simulations (EP-GAN-E described in Statistical Analysis and Loss Extension). With such extensions, EP-GAN-E was able to improve its accuracies on both resting potentials and dynamical features with the most notable improvements on AWB where predicted voltage responses closely match slowly rising voltage response during stimulation. EPGAN-E is an example of further extensions to loss function that account for additional experimental features.

      Furthermore, the conclusion of the ablation study ('EP-GAN preserves reasonable accuracy up to a 25% reduction in membrane potential responses') does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off.

      Since EP-GAN baseline optimizes voltage responses during the stimulation period, RMSE was also evaluated with respect to this period. From these errors, we evaluated whether the predicted voltage error for each ablation scenario fell within the 2 standard deviations from the mean error obtained from synthetic neuron test data (i.e. the baseline performance). We found that for input ablation for voltage responses, the error was within such range up to 25% reduction whereas for steady-state current input ablation, all 25%, 50% and 75% reductions resulted in errors within the range.

      We extended the “Ablation Studies” sub-section so that the above reasoning is better communicated to the readers.

      Additionally, I found a number of minor issues:

      Minor 1: Table 1 lists the number of HH simulations as '32k (11k · 3)'. Should it be 33k, since 11.000 times 3 is 33.000? Please specify the exact number of samples.

      Minor 2: x- and y-ticks are missing in Fig 2, Fig 3, Fig S1, Fig S2, Fig S3 and Fig S4.

      Minor 3: All files in the supplementary zip file should be listed and described.

      Minor 4: Code for training the GAN, generation of training datasets and for reproducing the figures should be provided.

      Minor 5: In the reference (Figure 3A, Table 1 Row 2): should this refer to Table 2?

      Minor 6: 'the ablation is done on stimulus space where a 50% reduction corresponds to removing half of the membrane potential responses traces each associated with a stimulus.' - which half is removed?

      We thank the reviewer for pointing out these errors in the original manuscript. The revised manuscript includes corrections for these items. We will publish the python code reproducing the results in the public repository in the near future.

    1. eLife Assessment

      This study presents valuable findings on the role of the small GTPase Rab3A in homeostatic plasticity. While the study demonstrates that Rab3A is required for homeostatic scaling, the evidence supporting the model put forward by the authors is incomplete. The work will be of interest to researchers in the field of synaptic transmission and plasticity.

    2. Reviewer #1 (Public review):

      Koesters and colleagues investigated the role of the small GTPase Rab3A in homeostatic scaling of miniature synaptic transmission in primary mouse cortical cultures using electrophysiology and immunohistochemistry. The major finding is that TTX incubation for 48 hours does not induce an increase in the amplitude of excitatory synaptic miniature events in neuronal cortical cultures derived from Rab3A KO and Rab3A Earlybird mutant mice. NASPM application had comparable effects on mEPSC amplitude in control and after TTX, implying that Ca2+-permeable glutamate receptors are unlikely modulated during synaptic scaling. Immunohistochemical analysis revealed no significant changes in GluA2 puncta size, intensity, and integral after TTX treatment in control and Rab3A KO cultures. Finally, they provide evidence that loss of Rab3A in neurons, but not astrocytes, blocks homeostatic scaling. Based on these data, the authors propose a model in which neuronal Rab3A is required for homeostatic scaling of synaptic transmission, potentially through GluA2-independent mechanisms.

      The major finding - impaired homeostatic up-scaling after TTX treatment in Rab3A KO and Rab3 earlybird mutant neurons - is supported by data of high quality. However, the paper falls short of providing any evidence or direction regarding potential mechanisms. The data on GluA2 modulation after TTX incubation are likely statistically underpowered, and do not allow drawing solid conclusions, such as GluA2-independent mechanisms of up-scaling.

      The study should be of interest to the field because it implicates a presynaptic molecule in homeostatic scaling, which is generally thought to involve postsynaptic neurotransmitter receptor modulation. However, it remains unclear how Rab3A participates in homeostatic plasticity.

      Major (remaining) point:

      (1) Direct quantitative comparison between electrophysiology and GluA2 imaging data is complicated by many factors, such as different signal-to-noise ratios. Hence, comparing the variability of the increase in mini amplitude vs. GluA2 fluorescence area is not valid. Thus, I recommend removing the sentence "We found that the increase in postsynaptic AMPAR levels was more variable than that of mEPSC amplitudes, suggesting other factors may contribute to the homeostatic increase in synaptic strength." from the abstract.<br /> Similarly, the data do not directly support the conclusion of GluA2-independent mechanisms of homeostatic scaling. Statements like "We conclude that these data support the idea that there is another contributor to the TTX- induced increase in quantal size." should be thus revised or removed.

    3. Reviewer #2 (Public review):

      I thank the authors for their efforts in the revision. In general, I believe the main conclusion that Rab3A is required for TTX-induced homeostatic synaptic plasticity is well-supported by the data presented, and this is an important addition to the repertoire of molecular players involved in homeostatic compensations. I also acknowledge that the authors are more cautious in making conclusions based on the current evidence, and the structure and logic have been much improved.

      The only major concern I have still falls on the interpretation of the mismatch between GluA2 cluster size and mEPSC amplitude. The authors argue that they are only trying to say that changes in the cluster size are more variable than those in the mEPSC amplitude, and they provide multiple explanations for this mismatch. It seems incongruous to state that the simplest explanation is a presynaptic factor when you have all these alternative factors that very likely have contributed to the results. Further, the authors speculate in the discussion that Rab3A does not regulate postsynaptic GluA2 but instead regulates a presynaptic contributor. Do the authors mean that, in their model, the mEPSC amplitude increases can be attributed to two factors- postsynaptic GluA2 regulation and a presynaptic contribution (which is regulated by Rab3A)? If so, and Rab3A does not affect GluA2 whatsoever, shouldn't we see GluA2 increase even in the absence of Rab3A? The data in Table 1 seems to indicate otherwise.

      I also question the way the data are presented in Figure 5. The authors first compare 3 cultures and then 5 cultures altogether, if these experiments are all aimed to answer the same research question, then they should be pooled together. Interestingly, the additional two cultures both show increases in GluA2 clusters, which makes the decrease in culture #3 even more perplexing, for which the authors comment in line 261 that this is due to other factors. Shouldn't this be an indicator that something unusual has happened in this culture? Data in this figure is sufficient to support that GluA2 increases are variable across cultures, which hardly adds anything new to the paper or to the field. The authors further cite a study with comparable sample sizes, which shows a similar mismatch based on p values (Xu and Pozzo-Miller 2007), yet the effect sizes in this study actually match quite well (both ~160%). P values cannot be used to show whether two effects match, but effect sizes can. Therefore, the statement in lines 411-413 "... consistently leads to an increase in mEPSC amplitudes, and sometimes leads to an increase in synaptic GluA2 receptor cluster size" is not very convincing, and can hardly be used to support "the idea that there are additional sources contributing to the homeostatic increase in quantal size".

      I would suggest simply showing mEPSC and immunostaining data from all cultures in this experiment as additional evidence for homeostatic synaptic plasticity in WT cultures, and leave out the argument for "mismatch". The presynaptic location of Rab3A is sufficient to speculate a presynaptic regulation of this form of homeostatic compensation.

      Minor concerns:

      (1) Line 214, I see the authors cite literature to argue that GluA2 can form homomers and can conduct currents. While GluA2 subunits edited at the Q/R site (they are in nature) can form homomers with very low efficiency in exogenous systems such as HEK293 cells (as done in the cited studies), it's unlikely for this to happen in neurons (they can hardly traffick to synapses if possible at all).

      (2) Lines 221-222, the authors may have misinterpreted the results in Turrigiano 1998. This study does not show that the increase in receptors is most dramatic in the apical dendrite, in fact, this is the only region they have tested. The results in Figures 3b-c show that the effect size is independent of the distance from soma.

      (3) Lines 309-310 (and other places mentioning TNFa), the addition of TNFa to this experiment seems out of place. The authors have not performed any experiment to validate the presence/absence of TNFa in their system (citing only 1 study from another lab is insufficient). Although it's convincing that glia Rab3A is not required for homeostatic plasticity here, the data does not suggest Rab3A's role (or the lack of) for TNFa in this process.

    4. Reviewer #3 (Public review):

      This manuscript presents a number of interesting findings that have the potential to increase our understanding of the mechanism underlying homeostatic synaptic plasticity (HSP). The data broadly support that Rab3A plays a role in HSP, although the site and mechanism of action remain uncertain.

      The authors clearly demonstrate that Rab3A plays a role in HSP at excitatory synapses, with substantially less plasticity occurring in the Rab3A KO neurons. There is also no apparent HSP in the Earlybird Rab3A mutation, although baseline synaptic strength is already elevated. In this context, it is unclear if the plasticity is absent, already induced by this mutation, or just occluded by a ceiling effect due to the synapses already being strengthened. Occlusion may also occur in the mixed cultures when Rab3A is missing from neurons but not astrocytes. The authors do appropriately discuss these options. The authors have solid data showing that Rab3A is unlikely to be active in astrocytes, Finally, they attempt to study the linkage between changes in synaptic strength and AMPA receptor trafficking during HSP, and conclude that trafficking may not be solely responsible for the changes in synaptic strength during HSP.

      Strengths:

      This work adds another player into the mechanisms underlying an important form of synaptic plasticity. The plasticity is likely only reduced, suggesting Rab3A is only partially required and perhaps multiple mechanisms contribute. The authors speculate about some possible novel mechanisms, including whether Rab3A is active pre-synaptically to regulate quantal amplitude.

      As Rab3A is primarily known as a pre-synaptic molecule, this possibility is intriguing. However, it is based on the partial dissociation of AMPAR trafficking and synaptic response and lacks strong support. On average, they saw a similar magnitude of change in mEPSC amplitude and GluA2 cluster area and integral, but the GluA2 data was not significant due to higher variability. It is difficult to determine if this is due to biology or methodology - the imaging method involves assessing puncta pairs (GluA2/VGlut1) clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, with usually less than 20 synapses per neuron analyzed, which would be expected to be more variable than mEPSC recordings averaged across several hundred events. However, when they reduce the mEPSC number of events to similar numbers as the imaging, the mESPC amplitudes are still less variable than the imaging data. The reason for this remains unclear. The pool of sampled synapses is still different between the methods and recent data has shown that synapses have variable responses during HSP. Further, there could be variability in the subunit composition of newly inserted AMPARs, and only assessing GluA2 could mask this (see below). It is intriguing that pre-synaptic changes might contribute to HSP, especially given the likely localization of Rab3A. But it remains difficult to distinguish if the apparent difference in imaging and electrophysiology is a methodological issue rather than a biological one. Stronger data, especially positive data on changes in release, will be necessary to conclude that pre-synaptic factors are required for HSP, beyond the established changes in post-synaptic receptor trafficking.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a strong frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. But the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the conclusions about the GluA2 imaging as compared to the mEPSC amplitude data.

      To understand the role of Rab3A in HSP will require addressing two main issues:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. More concrete support for the authors' suggestion of a pre-synaptic site of control would be helpful.

      (2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs or a decrease in GABA release (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at those synapses.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Since multiple Reviewers requested that the results describing effects of TTX treatment on GluA2 receptor levels detected by immunofluorescence and confocal imaging be revised, we have made substantial changes, which are described below. We believe the changes have greatly improved the manuscript and thank the reviewers for their comments.

      Lack of significant increase in GluA2 receptor data is due to too few cultures sampled; anything could have happened [in one] particular dissociation. A concern that the TTX effect might vary greatly from culture to culture was why we felt it was important to match the receptor measurements on the same cultures that we recorded mEPSCs. We now present the culture means in Figure 5A (mEPSCs) and 5B (GluA2 receptor cluster size). These plots make it clear that the variability in the GluA2 receptor cluster size effect is not attributable to a failure of that culture to show a homeostatic effect. That is, the variability in GluA2 receptor effect is independent of the variability in mEPSC effect. To increase sample size, we examined 2 additional cultures for synaptic GluA2 receptor levels in control vs. TTX treatment. These cultures showed very modest increases (Figure 5C). When cell means from these experiments were pooled with those from the 3 matched cultures, the TTX effect was still not statistically significant (Figure 5G).

      Lack of significant increase in GluA2 receptor data is due to the choice to restrict our analysis to the primary dendrite, close to the cell body. We restricted our analysis to the primary dendrite because Figure 3 in Turrigiano et al, 1998, shows the increased response to exogenously applied glutamate after TTX treatment is greatest close to the cell body and wanes as the glutamate is applied further away (added to Results, new lines 388-389).

      Variability in GluA2 receptor data is due to the much smaller number of synapses sampled, compared to mEPSCs. We matched the sampling for mEPSC amplitude data to that of imaging data by taking only 20 samples from each electrophysiological recording. Each mEPSC represents one synapse; in a set of 20 mEPSCs some might come from the same synapse, so that we are sampling from £ 20 synapses. The effect of TTX on mEPSC amplitudes remained significant despite the reduced samples per cell (Figure 5A).

      Why do we fail to show a significant increase in receptors when this has been shown in many studies?

      We have added to our discussion the point that several studies, including Wang et al. 2019, use the number of puncta, rather than the number of cells, as the sample number. We ran an analysis of GluA2 receptor cluster size where we sampled multiple synapses per cell, and used the number of clusters as the sample n. We found that even with as few as 6 synapses randomly selected from each cell, the effect of TTX on GluA2 receptor cluster size became highly significant (p = 0.001 for data from 3 cultures and p = 0.005 for data from 5 cultures) (see new lines 400-406 in Discussion). In sum, our data are not very different from that of some previous studies. We are not arguing that receptors do not increase. Instead our point is that the increase is more variable than the increase in MESPC amplitude and thus takes a much bigger sample size to detect. In sum, the difference between the mEPSC data and the receptor data is that the mEPSC data consistently show a ~20-25% increase, whereas the receptor data do not always show an increase and sometimes the increase is only ~10%. Finally, we added two matched culture experiments examining synaptic GluA1 receptor cluster characteristics. GluA1 receptor cluster size decreased in one culture, and increased very modestly in the other (Supplemental Figure 1B), whereas mEPSC amplitude robustly increased (Supplemental Figure 1A; Results, new lines 265-268).

      We conclude that these data support the idea that there is another contributor to the TTXinduced increase in quantal size.

      Other changes in presentation of GluA2 receptor results: Since the effects on intensity and integral are of lesser magnitude than that on cluster size, we have removed these results from the graphs, although they are presented in Table 1. We have removed Figure 6, the presentation of individual culture results, since these results are now conveyed in Figure 5A-C. We have removed graphs depicting GluA2 receptor cluster size in response to TTX in Rab3A-/- cultures, but these data are still presented in Table 1.

      We address other detailed comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      (2) The effects of Rab3A on TTX-induced mini frequency modulation remains unclear, because TTX does not induce a change in mini frequency in the Rab3A+/Ebd control (Fig. 2). The respective conclusions should be revised accordingly (l. 427).

      The effects on mini frequency were added for completeness, but given the lack of consistently significant changes with TTX treatment or changes in the KO or Rab3A<sup>Ebd/Ebd</sup> cultures, we have removed comment on these results from the Discussion.

      (3) The model is still not supported by the data. In particular, data supporting a negative regulation of Rab3A by APs, Rab3A-dependent release of a tropic factor, or a Rab3Adependent increase in GluA2 abundance are not presented.

      We have removed the model from the manuscript.

      (4) Data points are not overlapping and appear "quantal" in most box plots. How were the data rounded?

      The appearance of quantal variation in cell amplitude means is due to the binning that is part of the creation of the box plot. We have not remade the figures without binning, because the binning provides a visual depiction of the distribution of the data points. We have added the bin sizes to the appropriate figure legends.

      Reviewer #2 (Public review):

      However, the authors still have not provided further investigation of the mechanisms behind the role of Rab3A in this form of plasticity, and the revision therefore has added little to the significance of the study. Moreover, the experimental design for the investigation of the mismatch between mEPSC amplitude and GluA2 cluster fluorescence remains questionable, making it difficult to draw any credible conclusions from groups of data that not only look similar to the eye but also show no significance statistically.

      To our knowledge, no other study has matched measurements of mEPSC amplitude in the same cultures where synaptic receptor levels were assessed. As stated above, we have revised the presentation of GluA2 receptor results, concluding from the lack of significant effects on receptor levels that the mEPSC amplitude increase cannot be fully explained by the receptor data (which is strengthened by addition of two more cultures analyzed for GluA2 immunofluorescence). This is an important addition to the significance of the study.

      In summary, this study establishes that neuronal Rab3A plays a role in homeostatic synaptic plasticity, but so do a number of other molecules that have been implicated in homeostatic synaptic plasticity in the past two decades (only will grow with the new techniques such as RNAseq). Without going beyond this finding and demonstrating how exactly Rab3A participates in the induction and/or expression of this form of plasticity, or maybe the potential Rab3A-mediated functional and behavioral defects in vivo, the contribution of the current study to the field is limited. However, given the presynaptic location of Rab3A, this finding could serve as a starting point for researchers interested in pre-postsynaptic cross-talk during homeostatic plasticity in general.

      We previously published a review in which we list 19 molecules known at that time to be important for homeostatic synaptic plasticity (see Table 2, Koesters et al., 2024), and they fall into two categories: molecules involved in glutamate receptor expression or trafficking, and signaling molecules. Rab3A is the first synaptic vesicle protein to be implicated in homeostatic plasticity of quantal size. We have added this point to the Discussion, new lines 473-476. By demonstrating that Rab3A is not acting in glia (which release TNF, which regulates receptor expression), and that GluA2 receptor levels do not explain the homeostatic mEPSC increase in our experimental conditions, we have ruled out two major mechanisms.

      Reviewer #3 (Public review):

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. However the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the weakly supported conclusions about the GluA2 imaging vs mEPSC amplitude data.

      We cannot rule out that the NAPSM-induced decrease in mEPSC frequency is due to a loss of presynaptic glutamate receptor enhancement of release probability, and have added this statement to the Results, new lines 202-204. Regarding the p value of 0.08—we are not arguing that NASPM has no effect on mEPSC amplitude, only that it has no effect on the homeostatic increase in amplitude after TTX treatment. An increase in GluA1/A2 heteromers should have been detected in our imaging studies.

      Unaddressed issues that would greatly increase the impact of the paper:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. They could use sparse knockdown of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      We agree that doing co-cultures of Rab3A-/- and Rab3A+/+ neurons is the definitive experiment to determine the locus of action of Rab3A in homeostatic synaptic plasticity. We hope to examine this question in a future manuscript.

      (2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      We agree that it would be very interesting to determine if the homeostatic decrease in mIPSCs after activity blockade depends on Rab3A. We hope to address this question in the future.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The abstract is a bit repetitive in places. Some editing would be advised.

      We did not identify anything repetitive in the abstract except the parallel construction referring to the previous findings at the NMJ and current findings in cortical neurons. However, we have eliminated a section in the introduction which went into detail about the receptor imaging results (previous lines 103-110).

      Line 77: 'shift toward early awakening' is unclear; do you mean shorter sleep/wake cycle? Other circadian issues? A more complete description is needed.

      We have moved the additional detail about the Earlybird mutation’s effect on circadian period from the Results to the Introduction, new lines 77 to 79.

      The results section has many passages that seem more like discussion, offering various interpretation and alternatives for the data. While some commentary is appropriate, to justify the next series of experiments and maintain a logical flow, this manuscript has rather a high amount of this. Some editing and shifting material to the discussion might be warranted.

      We have reduced the commentary in the Results section.

      Line 245: GluA2 homomers are really unlikely, as they won't pass current (unless unedited) and don't often if ever form. But GluA2/A3 heteromers are likely (and detected by their methods).

      GluA2 homomers do conduct current, albeit less than heteromers (Swanson et al., 1997; Oh and Derkach, 2005; Coombs et al., 2019). [The Oh and Derkach paper shows a GluA2 homomer current in Supplementary Figure 3]. We have modified the text to acknowledge that the GluA2 receptor imaging will detect heteromers and homomers (Results, new lines 214 to 215).

      Line 258: If the number of synaptic pairs analyzed was usually <20, what was the average and range of pairs? This gets into the sampling issue.

      We have added the average number of synaptic sites (20.4 ± 6.5) and range (11-38) to the text, Results, new line 229.

      Are the stats of the baseline mEPSC amplitude and frequency shifts (WT vs KO on WT feeder layer) given somewhere (lines 398-402)? If not, please add them.

      These stats have been added to the text, mEPSC amplitude, (CON, WT on WT, 13.3 ± 0.5 pA; CON, KO on WT, 15.2 ± 1.1 pA, p = 0.23, Kruskal-Wallis test), new lines 325-326 and frequency, (CON, WT on WT, 2.54 ± 0.57 sec<sup>-1</sup>; CON, KO on WT, 4.46 ± 1.21 sec<sup>-1</sup>, p = 0.23, Kruskal-Wallis test), new lines, 329-330.

      25mM K+ is going to be much more than 'mildly' depolarizing (line 697). Should just skip that word.

      ‘mildly’ has been removed.

      The section on MiniAnalysis seems overly argumentative, and there is no need to discuss flaws in the Wu paper. The important thing (a bit buried at the end of this section) is that the manual mini selection was done blind to condition, which is the normal way of dealing with potential bias. It would be better to limit the methods to describing what was done.

      The bulk of the justification of manual analysis has been removed from the text.

      The discussion of potential conductance changes (lines 534-6) seems somewhat unwarranted.

      Modification of GluA1 phosphorylation in the GluA1/A2 heteromer would not be detected by NASPM (and the NASPM data being a bit inconclusive anyway). Further, auxiliary subunits (like TARPs) can alter conductance of any of the AMPARs. So I don't think they have enough data to exclude such a possibility.

      The discussion of contributions of conductance have been removed from the text.

      Coombs ID, Soto D, McGee TP, Gold MG, Farrant M, Cull-Candy SG (2019) Homomeric GluA2(R) AMPA receptors can conduct when desensitized. Nat Commun 10:4312.

      Oh MC, Derkach VA (2005) Dominant role of the GluR2 subunit in regulation of AMPA receptors by CaMKII. Nat Neurosci 8:853-854.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

    1. eLife Assessment

      This study provides valuable information on a novel gene that regulates meiotic progression in both male and female meiosis. The evidence supporting the conclusions of the authors is solid. This study will be of interest to developmental and reproductive biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the full-length BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool.

      Strengths:

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool.

      Weaknesses:

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice.

    3. Reviewer #2 (Public review):

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically difficult to study due to its location on the X chromosome and male sterility of global knockout animals.

      The authors have been extremely responsive to reviewer critiques and have presented strong data and appropriate conclusions, making it an excellent addition to the field.

    4. Reviewer #3 (Public review):

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed truncated isoform. Bend2 deletion in male showed fertility and this enabled them to analyze the BEND2 function in females. They showed that Bend2 deletion in females showed decreasing follicle number which may lead to loss of ovarian reserve.

      Strengths:

      They found the truncated isoform of Bend2 and the depletion of this isoform showed decreasing follicle number at birth.

      Weaknesses:

      The authors showed novel factors that impact ovarian reserve. Although the number of follicles and conception rate are reduced in mutant mice, the in vitro fertilization rate is normal and follicles remain at 40 weeks of age. It is difficult to know how critical this is when applied to the human case.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the fulllength BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool.

      Strengths:

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool.

      Weaknesses:

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice. Only a few experiments were performed using female mice, therefore, more experiments should be performed to complete the story of the role of BEND2 on female fertility. In addition, the title and abstract of the manuscript do not align with the story, as female fertility is only a small portion of the data compared to the male fertility section.

      We appreciate the reviewer’s thoughtful summary, recognition of the strengths of our study, and constructive feedback. In the revised manuscript, we have performed additional experiments to enhance our understanding of the role of BEND2 in female gametogenesis. These new experiments provide further insights into the establishment of the ovarian reserve and the role of BEND2 in female fertility.

      Additionally, we have rewritten the title, abstract, and introduction to better align with the content of the manuscript and to reflect the balance between the male and female fertility results. We believe these changes address the reviewer’s concerns and improve the overall clarity and focus of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      • I recommend that the authors re-organize their abstract and introduction to accurately reflect the manuscript's primary focus on male fertility. Right now, the title of the manuscript is misleading. The manuscript does not investigate reproductive aging; rather, it primarily describes the depletion of primordial follicle number. The mechanism behind this depletion and whether this phenotype accelerates reproductive aging, are not explored. Clarifying these points will help align the title and content of the manuscript more accurately.

      We thank the reviewer for this suggestion. We agree that the original title and abstract did not fully capture the focus of the study. In response, we have rewritten the title, abstract, and introduction to better align with the results presented, focusing more clearly on the implications of the effects of the full-length BEND2 depletion for spermatogenesis and oogenesis. These revisions ensure that the title, the abstract, and the manuscript's introduction are now more accurately reflective of the work performed.

      • Figure 1: I couldn't find the validation of the polyclonal antibody against BEND2 that the authors generated.

      Regarding this query about the validation of the polyclonal antibody against BEND2, we apologize for any confusion. We would like to clarify that this validation is indeed presented in Figure 2 of our manuscript. To ensure this information is easily accessible, we have revised the text to explicitly mention the validation in Figure 2.

      • Figure 2A: Could you provide the actual numbers for the weight of the mice testis?

      In response to this question regarding Figure 2A and the weights of the mice testis, we have now included this data in a graph in Fig 2A and Table S1 and added this information in the results section.

      • Figure 2C and D: I am confused by the fact that in the WB we can appreciate a high expression of the p75 protein, but the signal is very low in the IF (Figure 2D).

      We thank the reviewer for raising this point. We acknowledge the apparent discrepancy between the strong p75 signal observed in the Western blot (Fig. 2C) and the weaker signal seen in the immunofluorescence (Fig. 2D). We think several factors could contribute to this difference, such as differences in sensitivity and detection methods, epitope accessibility, protein localization or differences in sample preparation, antibody affinity, and experimental conditions between Western blot and IF.

      • In the same figure, the authors also mention that the p75 protein is functional. On what basis do they rely on reaching this conclusion?

      We acknowledge that we cannot definitively confirm the functionality of the p75 protein. Our assumption was based on the observed fertility of the male mice and existing literature indicating that BEND2 is essential for completing meiosis (Ma et al., 2022). However, we understand the importance of clarity in our claims. To avoid any potential confusion, we have revised the sentence to read: "The p75 BEND2 protein—likely corresponding to an exon 11-skipped transcript—is present and might be functional in our mutant testis, based on the observed phenotype (see below)."

      • The phenotype in females is very interesting. The authors conclude that BEND2 influences primordial follicle formation, oocyte quality, fertility, and reproductive aging by (1) performing follicle counts, (2) analyzing the litter size, and (3) analyzing meiotic progression. Given that the authors build their story around these experiments, I strongly encourage them to expand the section on female fertility, or reorganize the manuscript, or be more cautious with some of their conclusions. They might consider performing additional experiments such as:

      - Oocyte quality: To determine whether BEND2 impacts oocyte quality, mice should be stimulated with hormones and oocyte quality should be analyzed (GV, MI, MII progression, spindle morphology and/or fertilization, and embryo development). Does the decrease in primordial follicles correlate with the number of ovulated oocytes, or is the impact only on oocyte quality?

      We appreciate the reviewer's suggestion to assess the impact of BEND2 on oocyte quality. Following the reviewer’s recommendation, we stimulated three control and three mutant mice. We analyzed the number of ovulated oocytes, their fertilization rate, and the percentage of embryos that developed to the blastocyst stage. These new results are included in the revised manuscript (see Results section and new Table 1). Our analyses indicate that for all parameters assessed, control and mutant oocytes behaved similarly. Specifically, there were no significant differences in the number of ovulated oocytes, fertilization rates, or the ability of embryos to progress to the blastocyst stage between the control and mutant groups. These findings suggest that mutant oocyte quality is comparable to control mice of a similar age. We have incorporated these new results into the manuscript.

      - Reproductive aging: A fertility trial would provide more information on whether BEND2 depletion triggers an acceleration of reproductive aging. In addition, the oldest mice used by the authors are 9 months old, and at this point, fertility has not declined yet.

      We appreciate the reviewer's suggestion regarding the assessment of reproductive aging. However, we respectfully disagree with the assertion that fertility has not declined by 9 months of age. In our colony, we have observed a significant decline in fertility around 10 months of age. Specifically, out of 18 10-month-old female mice placed in breeding cages, we observed only three pregnancies within the first 30 days (N.N. and I.R., data not published). Based on these observations, we determined that fertility begins to decline around this age in our colony, which informed our decision to use 9-month-old mice as the oldest age group for our analysis. Thus, this age is appropriate for evaluating the potential effects of BEND2 depletion on reproductive aging in our specific mouse population.

      - The observation that the primordial follicle pool is already diminished in mice that are 1 week old is very interesting. Some experiments that the authors could perform to figure out the mechanism are: (1) Analyzing apoptosis. Are the primordial follicles dying during the pool's establishment, or is this an ongoing apoptotic process throughout the mice's lifespan? (2) If the authors still have ovaries from mice younger than 1 week of age (when the primordial pool is forming), they could perform DDX4 staining and quantify the number of oocytes in follicles and the total number of oocytes. These experiments would provide mechanistic insights into whether BEND2 impacts the formation of the primordial follicle pool or if the pool forms but is then depleted.

      We appreciate the reviewer's suggestion to further explore the mechanism behind the reduced primordial follicle pool. In response, we have analyzed the number of DDX4positive cells (DDX4 labels oocytes) in newborn mutant and wild-type animals. Our results show that mutant ovaries contain significantly fewer oocytes compared to controls (see new Fig. 5). This finding supports the hypothesis that BEND2 is critical for the establishment of a normal ovarian reserve. We are grateful for this suggestion, as these additional data reinforce our conclusion that BEND2 is required to determine a normal ovarian reserve in mice.

      • What is the red signal in Supplementary Figure 1C?

      This image depicts the BEND2 staining pattern in 16 days post-coitum (dpc) wild-type mouse ovaries. To clarify this and prevent any confusion, we have updated the figure legend to explicitly state that the sample shown is from a wild-type mouse.

      • Please spell out the full term of all the acronyms.

      We apologize for the oversight in not fully spelling out some acronyms in the original manuscript. We have carefully reviewed the entire manuscript and have ensured that all acronyms are now spelled out in full upon their first use in the revised version. We want to thank the reviewer for bringing this to our attention.

      • Is Line-1 also dysregulated in the ovary? This was one of the main findings from the male part. It would be interesting to perform the same analysis in the ovary since Line1 has a role in establishing the ovarian reserve (PDMI: 31949138).

      We thank the reviewer for this insightful suggestion. We have analyzed the number of LINE1 and SYCP3-positive cells in wild-type and mutant newborn ovaries (new Fig. S4). Our results show no significant difference between the two genotypes, suggesting that LINE-1 is not dysregulated in newborn Bend2 mutant oocytes. These findings indicate that, at least in the context of the newborn ovary, LINE-1 does not appear to be affected by BEND2 depletion.

      Reviewer #2 (Public Review):

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically difficult to study due to its location on the X chromosome and male sterility of global knockout animals.

      While the manuscript is an overall excellent addition to the field, it would significantly benefit from a few additional experiments, as well as some additional clarification/elaboration.

      The claim that BEND2 is required for ovarian reserve establishment is not supported, as the authors only look at folliculogenesis and oocyte abundance starting at one week of age, after the reserve is formed. Analysis of earlier time points would be much more convincing and would parse the role of BEND2 in the establishment vs. maintenance of this cell population. In spermatocytes, the authors demonstrate a loss of nuclear BEND2 in their mutant but do not comment on the change in localization (which is now cytoplasmic) of the remaining protein in these animals. This may have true biological significance and a discussion of this should be more thoroughly explored.

      We thank the reviewer for their thoughtful feedback and constructive suggestions to improve our manuscript.

      In response to the comment regarding the establishment of the ovarian reserve, we have now analyzed Bend2 mutant and control newborn ovaries. Our results show a significant reduction in the number of DDX4-positive cells in mutant ovaries compared to controls. These findings demonstrate that BEND2 is required for the establishment of the ovarian reserve, as the reduction is evident at birth.

      Regarding the cytoplasmic staining of BEND2 in mutant spermatocytes, we did perform secondary-antibody-only controls using goat anti-rabbit Cy3 to address the specificity of the signal. The staining observed in the Bend2 mutants closely resembles background staining, suggesting that the cytoplasmic signal is nonspecific. Therefore, we do not believe this represents a meaningful change in the localization of BEND2 protein in the mutants. We have clarified this in the revised manuscript to address this point.

      We hope these additional experiments and clarifications strengthen the manuscript and address the reviewer’s concerns.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      (1) The title of the manuscript does not accurately capture the content of the work. The vast majority of the data presented here is from the male, which is not reflected at all in the title - perhaps considering revising it?

      Thank you for your valuable suggestion. We agree that the original title did not fully reflect the focus of the manuscript. In response, we have revised the title, along with the abstract and introduction, to more accurately capture the content of the study and the emphasis on the male data. These changes ensure that the manuscript more clearly aligns with the results presented.

      (2) In Figure 2D, the authors demonstrate that WT BEND2 expression and localization are lost in the mutant, but staining is still apparent, just in the cytoplasm. Did the authors perform secondary-antibody-only controls to determine if this was background staining or real staining? If real, can they comment on the change in localization of the protein?

      We thank the reviewer for this insightful question. We have indeed performed secondary antibody-only controls using goat anti-rabbit Cy3. The staining observed in the Bend2 mutants closely resembles background staining, suggesting that the signal in the cytoplasm is not specific. Therefore, we do not believe this staining represents any real or meaningful expression of the BEND2 protein in the mutants.

      (3) In Figure S2A, the authors show Ku70 staining and describe that it is similar between the genotypes, but - to my eye - it looks quite distinctly different. It appears to stain in patches in WT SYCP3+ spermatocytes, versus staining in patches in the more mature, SYCP3- germ cells closer to the lumen in the mutant. Can the authors please clarify, or provide arrows to point which foci they are referring to?

      We apologize for the confusion caused by the image provided in the original submission. Upon review, we realized that the mutant image was not fully representative of the staining pattern observed in the majority of mutant samples. We have replaced this image with a new one in the revised manuscript, which more accurately reflects the similarity in Ku70 staining between wild-type and mutant testis. In this updated Figure S2, we have also included arrowheads to indicate the relevant foci, making it clearer to the reader. We have updated the figure legend to correspond with these changes as well.

      (4) The authors state that BEND2 is "required to establish the ovarian reserve during oogenesis" but this has not been demonstrated. The authors do show a reduced density of primordial follicles at one week of age. While this is compelling data, the ovarian reserve is established earlier in the mouse, around postnatal days 0-1, so it is not clear from this manuscript whether BEND2 is required for the maintenance of this population after PND1, leading to reduced numbers by 1 week of age, OR if it is required for the establishment of this population, which would result in reduced numbers of oocytes around the time of birth. This is a critical experiment that should be performed in order to determine which of these possibilities is likely the case. Ideally, looking at embryonic through early postnatal time points during ovarian development would be very helpful.

      We thank the reviewer for raising this important point. As mentioned earlier in response to Reviewer 1, we have performed the experiment suggested by Reviewer 2 and analyzed the number of DDX4-positive cells in newborn ovaries. Our results show that Bend2 mutant ovaries have fewer oocytes at birth than wild-type controls (Fig. 5H). This finding reinforces our conclusion that BEND2 is indeed required to establish the ovarian reserve, as the reduction in oocyte number is evident at the time of birth. We agree that this additional data strengthens our original claim, so we have included these results in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed a truncated isoform. This mutant male showed increasing apoptosis due to unrepaired double-strand breaks. However, this mutant male has fertility, and this enabled them to analyze Bend2 function in females. They revealed that Bend2 mutation in females showed decreasing follicle numbers which leads to loss of ovarian reserve.

      Strengths:

      Since their Bend2 mutant males were fertile, they were able to analyze the function of Bend2 in females and they revealed that loss of Bend2 causes less follicle formation.

      Weaknesses:

      Why the phenotype of their mutant male is different from previous work (Ma et al.) is not clear enough although they discuss it.

      We appreciate the reviewer’s comment regarding the differences between our Bend2 mutant male phenotype and the previously reported phenotype by Ma et al., 2022. We believe this discrepancy is due to the fact that the Bend2 locus encodes two BEND2 isoforms: p140 and p80. In contrast to the previous study, where both proteins were ablated by mutation employed (the deletion of exons 12 and 13), our exon 11 deletion specifically ablates p140 expression while allowing the expression of p80 in the testis.

      Based on the distinct phenotypes observed in the two Bend2 mutant mouse models, we hypothesize that p80 is sufficient to fulfill BEND2’s roles in meiosis, which could explain why our Bend2 mutant males remain fertile. We have rewritten the relevant sections in the results and discussion to better articulate this hypothesis and clarify the potential mechanisms behind the observed phenotypic differences.

      We hope these clarifications and additional details adequately address the reviewer’s concerns.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed that Bend2 mutant females had decreased fertility. This may be due to decreased ovarian reserve. Did the authors check if the mutant mice decreased or lost fertility faster than WT? If the authors have the data, please refer to it in the manuscript.

      We followed the breeding performance of a small number of control and Bend2 mutant females, and preliminary observations suggested no clear differences between the two groups. However, due to the limited sample size, we felt that these data were not conclusive enough to be included in the manuscript. We agree that a more thorough analysis of fertility decline over time would be valuable, and we plan to address this question in a future study.

      (2) In Figure 1 A, there is no exon1 in the upper figure.

      We thank the reviewer for pointing this out. We have revised Figure 1A to include exon 1 and ensure the schematic is accurate. The updated figure is included in the revised version of the manuscript.

      (3) Figure 3A, it would be nice to show several tubules of the testis section as well as an enlarged one.

      Following the reviewer's advice, we have revised Figure 3A to include new images showing several tubules and an enlarged view of one section of a tubule. These updates are included in the revised manuscript to better represent the testis sections.

      (4) Please be consistent with the format of the graph, especially Supplemental figures 2C and 4D.

      We have revised the figures, including Supplemental Figures 2C and 4D, to ensure consistency in the format throughout the manuscript. We have made modifications to the figures to align them more closely and improve the overall presentation.

    1. eLife Assessment

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality. However, the quantification of the wrapping index, the role of Htl/Uif/Notch signaling in differentiation vs growth/wrapping, and the mechanism of how Uif "stabilizes" a specific membrane domain capable of interacting with specific axons might require further clarification or discussion. The work will be of interest to neuroscientists working on glial cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin-forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as a powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. Using this model, this study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons.

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi-mediated knockdown, acute Crispr-Cas9 knock-outs, and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community.

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third-instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase in wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein that contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif.

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by the over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain-containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context.

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments would need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable of interacting with specific axons. Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling to visualize Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      Finally, in light of the importance of correct ensheathment of axons by glia for neuronal function, this study will be of general interest to the glial biology community.

    3. Reviewer #2 (Public review):

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors perform a large-scale screen of over 2600 RNAi lines to find factors that control the downstream signaling in this process. They identify a transmembrane protein Uninflatable to be necessary for the formation of plasma membrane domains. They further find that a Uif regulatory target, Notch, is necessary for glial wrapping. Interestingly, additional evidence suggests Notch itself regulates uif and htl, suggesting a feedback system. Together, they propose that Uif functions as a "switch" to regulate the balance between glial growl and wrapping of axons.

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif is a promising link to shed light on this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The EM studies in particular are of outstanding quality and really help to mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this valuable study provides convincing evidence of a new player coordinating the interactions controlling the glial wrapping of axons.

    4. Author response:

      We are grateful to the reviewers and editors for their time and positive assessment of our manuscript. We will incorporate all their comments to further improve our work. In the revised version of the manuscript, we will provide a more detailed description of the quantification of the wrapping index and further explain the differential roles of Htl and Uif during cell growth versus the role of Notch during axon wrapping. In addition, we will perform further experiments using combinations of reporters and antibodies to further explore the relationship between Htl, Uif and Notch. The discussion will be expanded and possible mechanisms by which Uif 'stabilises' a specific membrane domain will be included.

    1. eLife Assessment

      In this important study, the authors test the model that a type of vascular lesion caused by the inactivation of one gene in the cells that line blood vessels requires the activity of a second gene for the lesions to form. The evidence supporting the conclusions is solid.

    2. Reviewer #2 (Public review):

      Summary:

      Previously, the authors developed a zebrafish model for cerebral cavernous malformations (CCMs) via CRISPR/Cas9-based mosaic inactivation of the ccm2 gene. This model yields CCM-like lesions in the caudal venous plexus of 2 days post-fertilization embryos and classical CNS cavernomas in 8-week fish that depend, like the mouse model, on the upregulation of the KLF2 transcription factor. Remarkably, the morpholino-based knockdown of the gene encoding the Beta1 adrenergic receptor or B1AR (adrb1; a hemodynamic regulator) in fish and treatment with the anti-adrenergic S enantiomer of propranolol in both fish and mice reduce the frequency and size of CMM lesions.

      In the present study, the authors aim to test the model that adrb1 is required for CCM lesion development using adrb1 mutant fish (rather than morpholino-mediated knockdown and pharmacological treatments with the anti-adrenergic S enantiomer of propranolol or a racemic mix of metoprolol (a selective B1AR antagonist).

      Strengths:

      The goal of the work is important, and the findings are potentially highly relevant to cardiovascular medicine.

      Comments on latest version:

      This reviewer is largely satisfied and congratulates the authors on their updated work. However, the comments regarding the caveats of morpholino use and lack of validation that the morphants phenocopy the mutants using the readouts that they employ still stand (for instance, the tnnt2a MO has been extensively validated for phenocopying lack of cardiac contractility, not for the phenotypes under study). Finally, while using the cytosolic red line to mask a nuclear green readout is suboptimal (not for FRET reasons), this is now a minor issue given that all comparisons are made using this method and the increase in sample size.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work seeks to provide genetic evidence for a role for beta-adrenergic receptors that regulate heart rate and blood flow on cavernous malformation development using a zebrafish model, and to extend information regarding beta-adrenergic drug blockade in cavernous malformation development, with the idea that these drugs may be useful therapeutically.

      Strengths:

      The work shows that genetic loss of a specific beta-adrenergic receptor in zebrafish, adrb1, prevents embryonic venous malformations and CCM in adult zebrafish brains. Two drugs, propranolol and metoprolol, also blunt CCM in the adult fish brain. These findings are predicted to potentially impact the treatment of human CCM, and they increase understanding of the factors leading to CCM.

      Response 1: We are grateful for the reviewer’s acknowledgment of this study’s potential translational significance.

      Weaknesses:

      There are minor weaknesses that detract slightly from enthusiasm, including poor annotation of the Figure panels and lack of a baseline control for the study of Klf2 expression (Figure 4).

      Response 2: We agree. Annotation of the Figure panels were added, and a baseline control for the study of klf2a expression (Figure 4) was added. Details were described in the response to “recommendations for the authors”.

      Reviewer #2 (Public review):

      Summary:

      Previously, the authors developed a zebrafish model for cerebral cavernous malformations (CCMs) via CRISPR/Cas9-based mosaic inactivation of the ccm2 gene. This model yields CCM-like lesions in the caudal venous plexus of 2 days post-fertilization embryos and classical CNS cavernomas in 8-week fish that depend, like the mouse model, on the upregulation of the KLF2 transcription factor. Remarkably, the morpholino-based knockdown of the gene encoding the Beta1 adrenergic receptor or B1AR (adrb1; a hemodynamic regulator) in fish and treatment with the anti-adrenergic S enantiomer of propranolol in both fish and mice reduce the frequency and size of CMM lesions.

      In the present study, the authors aim to test the model that adrb1 is required for CCM lesion development using adrb1 mutant fish (rather than morpholino-mediated knockdown and pharmacological treatments with the anti-adrenergic S enantiomer of propranolol or a racemic mix of metoprolol (a selective B1AR antagonist).

      Strengths:

      The goal of the work is important, and the findings are potentially highly relevant to cardiovascular medicine.

      Response 3: We are grateful for the reviewer’s acknowledgment of this study’s scientific importance and clinical relevance.

      Weaknesses:

      (1) The following figures do not report sample sizes, making it difficult to assess the validity of the findings: Figures 1B and D (the number of scored embryos is missing), Figures 2G and 3B (should report both the number of fish and lesions scored, with color-coding to label the lesions corresponding to individual fish in which they were found).

      Response 4: We agree. Sample sizes of Figures 1B and D were added in the figures and figure legends. Sample sizes of Figures 2G and 3B were added in their figure legend respectively. The lesion volume in Figures 2G and 3B is the total lesion volume in each brain.

      (2) Figure 4 has a few caveats. First, the use of adrb1 morphants (rather than morphants) is at odds with the authors' goal of using genetic validation to test the involvement of adrb1 in CCM2-induced lesion development.

      Second, the authors should clarify if they have validated that the tnnt (tnnt2a) morpholino phenocopies tnnt2a mutants in the context in which they are using it (this reviewer found that the tnnt2a morpholino blocks the heartbeat just like the mutant, but induces additional phenotypes not observed in the mutants).

      Response 5: We appreciate the reviewer’s comments; however, generating adrb1<sup>-/-</sup> and tnnt2a<sup>-/-</sup> klf2a reporter fish, while also ensuring the presence of only one EGFP transgene allele for intensity measurement, would require prohibitively time-consuming breeding efforts.

      The use of morpholinos for tnnt2a and adrb1, as well as their effects on the heart, have been well-documented in previous studies (Sehnert AJ et al., Nat Genet. 2002;31:106-10; Steele SL et al., J Exp Biol. 2011;214:1445-57).

      Third, the data in Figure 4E is from just two embryos per treatment, a tiny sample size. Furthermore, judging from the number of points in the graph, only a few endothelial PCV cells appear to have been sampled per embryo. Also, judging from the photos and white arrowheads and arrows (Figure 4A-D), only the cells at the ventral side of the vessel were scored (if so, the rationale behind this choice requires clarification).

      Response 6: We have increased the sample size, as described in the Figure 4 legend. Regarding the scoring of endothelial nuclei, we focused on the ventral side of the vessel because nuclei on the dorsal side often reside at branching points of the venous plexus. This positional variance could influence klf2a expression levels; thus, we focused on the ventral surface to limit this potential confounding variable.

      Fourth, it is unclear whether and how the Tg(kdrl:mcherry)is5 endothelial reporter was used to mask the signals from the klf2a reporter. The reviewer knows by experience that accuracy suffers if a cytosolic or cell membrane signal is used to mask a nuclear green signal.

      Response 7: We agree that it is theoretically possible for Förster resonance energy transfer (FRET) to occur, as the emission spectrum of EGFP (495-550 nm in our filter setup) overlaps with the absorption spectrum of mCherry. However, several factors reduce the likelihood of FRET in our experimental setup:

      (1)  Without a nuclear localization signal, the majority of mCherry is localized in the cytoplasm, although small amounts may passively diffuse into the nucleus.

      (2)  EGFP, on the other hand, is predominantly localized in the nucleus due to the presence of a nuclear localization signal.

      (3)  FRET requires two fluorophores to be within a proximity of 8-10 nanometers or less for efficient energy transfer. The nuclear envelope, with a typical thickness of 30-50 nanometers, separates nuclear EGFP from cytoplasmic mCherry and FRET efficiency is inversely proportional to the sixth power of the distance between donor and acceptor. Thus, the theoretical likelihood of significant energy transfer under these conditions is low.

      To empirically examine potential FRET between nuclear EGFP and mcherry in our experiment setup, we scanned and scored the Tg(klf2a:H2b-EGFP; kdrl:mcherry) double transgenic embryos and Tg(klf2a:H2b-EGFP) embryos for EGFP intensity. The result is attached here:

      Author response image 1.

      42 endothelial nuclei from 7 embryos were scored as described in the Experimental Procedures of the manuscript. Two tailed t test were performed. P=0.4529

      Finally, the text and legend related to Figure 4 could be more explicit. What do the authors mean by a mosaic pattern of endothelial nuclear EGFP intensity, and how is that observation reflected in graph 4E? When I look at the graph, I understand that klf2a is decreased in C-D compared to A-B. Are some controls missing? Suppose the point is to show mosaicism of Klf2a levels upon ccm2 CRISPR. Don't you need embryos without ccm2 CRISPR to show that Klf2a levels in those backgrounds have average levels that vary within a defined range and that in the presence of ccm2 mosaicism, some cells have values significantly outside that range? Also, in 4A-D, what are the white arrowheads and arrows? The legend does not mention them.

      Response 8: We have revised our description of Figure 4 to better convey that mosaic expression of KLF2a is evidenced by the wide variability of klf2a reporter intensity in endothelial cells in ccm2 CRISPR embryos. A baseline control for the study of klf2a expression was added to Figure 4. The arrowheads and arrows in Figure 4A-D are explained in Figure 4 legends.

      Given the practical relevance of the findings to cardiovascular medicine, increasing the strength of the evidence would greatly enhance the value of this work.

      Recommendations for the authors:

      Reviewing Editor:

      Concerns about the labeling of figures and sample sizes should both be addressed, as detailed in the reviews, as this will be important to ensure the robustness of the claims.

      Reviewer #1 (Recommendations for the authors):

      Overall a strong research advance that provides rigorous genetic analysis and further drug testing in the zebrafish CCM model. There are some minor issues that, if addressed, would strengthen the work.

      Minor issues:

      (1) Figures in general are very poorly annotated and labeled. None of the images in Figures 1-3 show the reporter used to visualize vessels/CM, and the scale bars are not sized in the Figures or legends. Figure 1B is an experiment where the effects of a drug that increases heart rate are evaluated in mutants and controls, but the drug is not mentioned in the figure panel. Figure 1D shows the percentage of embryos with CVP dilation, but the graph and accompanying description does not define whether the percent is relative to the total embryos from the intercross or the percent of that category having the CVP dilation.

      Response 9: Changes were made in Figures and Figure legends. The transgenic reporter line Tg(fli1:EGFP) was annotated in Figures 1-3. Scale bars were sized in the Figures and Figure legends. The chemical used for Figure 1B was annotated in the Figure. The percentage of CVP dilation in the graph was explained in the Figure legend.

      (2) Figure 4 does not include baseline data in unmanipulated embryos scored at the same time to show the increase in Klf2 expression with mosaic ccm2 deletion. This is important as the result in E is interpreted as a lack of change in the increase.

      Response 10: A baseline control for the study of klf2a expression in Figure 4 was added.

      Reviewer #2 (Recommendations for the authors):

      SUGGESTIONS FOR EXPERIMENTS, DATA, OR ANALYSES

      (1) For maximum rigor, in the Figure 4 experiment, use adrb1 mutants and tnnt2a (silent heart) mutants (or verify that the adrb1 and tnnt2a morpholinos faithfully copy the phenotype of interest). See: Guidelines for morpholino use in zebrafish (PMID: 29049395; PMCID: PMC5648102).

      Response 11: See Response 5.

      (2) Increase sample sizes if appropriate.

      Response 12: In the revised version of the manuscript, we have increased the sample size, as described in the Figure 4 legend.

      (3) The imaging and fluorescence intensity analysis methods require more detail for reproducibility's sake. Please provide this information. See as a guideline: Guillermo MarquésThomas PengoMark A Sanders (2020) Science Forum: Imaging methods are vastly underreported in biomedical research eLife 9:e55133.

      Response 13: We added detailed procedures for the “Airyscan imaging and fluorescence intensity analysis” in the “Experimental Procedures”.

      (4) I suggest further clarifying how inhibition of B1AR prevents cavernoma formation. Given that lesion formation is suppressed in adrb1 mutants (which have slow blood flow) and 2,3-BDM treatment (which also slows blood flow) has a similar effect, the beneficial effects of propranolol and metoprolol might be due to the slowing of blood flow via B1AR targeting rather than reflecting that B1AR is a critical component of the genetic circuit for cavernoma formation. Indeed, in prior work by the same first author and collaborators (Elife 2021 May 20:10:e62155), the investigators observed reduced cavernoma formation in embryos devoid of cardiac contractility and thus lacking blood flow (tnnt2a morphants). Such a scenario does not take away the value of a pharmacological treatment. Still, it implies a different mechanism and allows potentially many other drugs with similar effects on blood flow to be effective.

      Discussing how B1AR activity is regulated and outlining future experiments would be helpful. Suggestions for the latter include testing the effect of normalizing blood flow in adrb1 mutants with a drug or providing exogenous B1AR in the myocardium or the endothelium to test the model further.

      Response 14: We are grateful for the reviewer’s suggestions and added the statement for future experiments.

      MINOR CORRECTIONS TO TEXT AND FIGURES

      (1) Figure 4E: Label the four genotypes explicitly, rather than A-D for the reader's ease.

      (2) Legend of Figure 4: "(F) EGFP intensity...". It should be (E).

      CITATIONS TO CORRECT

      (1) The citation for the Tg(kdrl:mcherry)is5 transgene needs to be corrected (reference 29 is from the Stainier lab). However, the "is" designation is for the Essner lab (https://zfin.org/action/feature/view/ZDB-ALT-110127-25)

      Response 15: Corrections were made as instructed.

    1. eLife Assessment

      This comprehensive scRNAseq atlas of the cranial region during neural induction, patterning, and morphogenesis provides a fundamental demonstration of how different cell fates are organized in specific spatial patterns along the anterior-posterior and medial-lateral axes within the developing neural tissue. The compelling data are analyzed with a rigorous computational approach, and the data revealed both known and novel genes differentially expressed along rostro-caudal and medio-lateral axes. This will be a helpful resource for researchers studying brain development.

    2. Reviewer #1 (Public review):

      Summary:

      This impressive study presents a comprehensive scRNAseq atlas of the cranial region during neural induction, patterning, and morphogenesis. The authors collected a robust scRNAseq dataset covering six distinct developmental stages. The analysis focused on the neural tissue, resulting in a highly detailed temporal map of neural plate development. The findings demonstrate how different cell fates are organized in specific spatial patterns along the anterior-posterior and medial-lateral axes within the developing neural tissue. Additionally, the research utilized high-density single-cell RNA sequencing (scRNAseq) to reveal intricate spatial and temporal patterns independent of traditional spatial techniques.

      The investigation utilized diffusion component analysis to spatially order cells based on their positioning along the anterior-posterior axis, corresponding to the forebrain, midbrain, hindbrain, and medial-lateral axis. By cross-referencing with MGI expression data, the identification of cell types was validated, affirming the expression patterns of numerous known genes and implicating others as differentially expressed along these axes. These findings significantly advance our understanding of the spatially regulated genes in neural tissues during early developmental stages. The emphasis on transcription factors, cell surface, and secreted proteins provides valuable insights into the intricate gene regulatory networks underpinning neural tissue patterning. Analysis of a second scRNAseq dataset where Shh signaling was inhibited by culturing embryos in SAG identified known and previously unknown transcripts regulated by Shh, including the Wnt pathway.

      The data includes the neural plate and captures all major cell types in the head, including the mesoderm, endoderm, non-neural ectoderm, neural crest, notochord, and blood. With further analyses, this high-quality data promises to significantly advance our understanding of how these tissues develop in conjunction with the neural tissue, paving the way for future breakthroughs in developmental biology and genomics.

      Strengths:

      The data is well presented in the figures and thoroughly described in the text. The quality of the scRNAseq data and bioinformatic analysis is exceptional.

      Weaknesses:

      None

    3. Reviewer #2 (Public review):

      Summary:

      Brooks et al. generate a compelling gene expression atlas of the early embryonic cranial neural plate. They generate single-cell transcriptome data from early cranial neural plate cells at 6 consecutive stages between E7.5 to E9. Utilizing computational analysis they infer temporal gene expression dynamics and spatial gene expression patterns along the anterior-posterior and mediolateral axis of the neural plate. Subsequent comparison with known gene expression patterns revealed a good agreement with their inferred patterns, thus validating their approach. They then focus on Sonic Hedgehog (Shh) signalling, a key morphogen signal, whose activities partition the neural plate into distinct gene expression domains along the mediolateral axis. Single-cell transcriptome analysis of embryos in which the Shh pathway was pharmacologically activated throughout the neural plate revealed characteristic changes in gene expression along the mediolateral axis and the induction of distinct Shh regulated gene expression programs in the developing fore-, mid- and hindbrain.

      Strengths:

      This manuscript provides a comprehensive transcriptomic characterisation of the developing cranial neural plate, a part of the embryo that to my knowledge has not been extensively analysed by single-cell transcriptomic approaches. The single-cell sequencing data appears to be of high quality and will be a great resource for the wider scientific community. Moreover, the computational analysis is well executed and the validation of the sequencing data using published gene expression patterns is convincing. In my opinion the authors completely achieved their aim of generating a reliable sequencing atlas of the early cranial neural plate. Conceptually, the findings that gene expression patterns differ along the rostrocaudal, mediolateral and temporal axes of the neural plate and that Shh signalling induces distinct target genes along the anterior-posterior axis of the nervous system are not completely unexpected. However, the comprehensive characterization of the spatiotemporal gene expression patterns and how they change upon ectopic activation of the Shh pathway will definitely contribute to a better understanding of neural plate patterning. Taken together, this is a well-executed study that describes a relevant scientific resource that will likely be of great use for the wider scientific community .

      Weaknesses:

      No weaknesses were identified.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed a detailed single-cell analysis of the early embryonic cranial neural plate with unprecedented temporal resolution between embryonic days 7.5 and 8.75. They employed diffusion analysis to identify genes that correspond to different temporal and spatial locations within the embryo. Finally, they also examined the global response of cranial tissue to a Smoothened agonist.

      Strengths:

      Overall, this is an impressive resource, well-validated against sets of genes with known temporal and spatial patterns of expression. It will be of great value to investigators examining early stages of neural plate patterning, neural progenitor diversity, and the roles of signaling molecules and gene regulatory networks controlling regionalization and diversification of the neural plate.

      Weaknesses:

      The manuscript should be considered a resource. Experimental manipulation is limited to analysis of neural plate cells that were cultured in vitro for 12 hours with SAG. They have identified a significant set of previously unreported genes that are differentially expressed in the cranial neural plate. Some additional analyses might help to highlight novel hypotheses arising from this remarkable resource.

      Comments on revisions: I am satisfied with the responses of the authors and do not have any further concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewing Editor Comment:<br /> Please note that all three reviewers suggested this manuscript would best fit as a resource paper at eLife.

      Reviewer #1 (Public review):

      Summary:

      This impressive study presents a comprehensive scRNAseq atlas of the cranial region during neural induction, patterning, and morphogenesis. The authors collected a robust scRNAseq dataset covering six distinct developmental stages. The analysis focused on the neural tissue, resulting in a highly detailed temporal map of neural plate development. The findings demonstrate how different cell fates are organized in specific spatial patterns along the anterior-posterior and medial-lateral axes within the developing neural tissue. Additionally, the research utilized high-density single-cell RNA sequencing (scRNAseq) to reveal intricate spatial and temporal patterns independent of traditional spatial techniques.

      The investigation utilized diffusion component analysis to spatially order cells based on their positioning along the anterior-posterior axis, corresponding to the forebrain, midbrain, hindbrain, and medial-lateral axis. By cross-referencing with MGI expression data, the identification of cell types was validated, affirming the expression patterns of numerous known genes and implicating others as differentially expressed along these axes. These findings significantly advance our understanding of the spatially regulated genes in neural tissues during early developmental stages. The emphasis on transcription factors, cell surface, and secreted proteins provides valuable insights into the intricate gene regulatory networks underpinning neural tissue patterning. Analysis of a second scRNAseq dataset where Shh signaling was inhibited by culturing embryos in SAG identified known and previously unknown transcripts regulated by Shh, including the Wnt pathway.

      The data includes the neural plate and captures all major cell types in the head, including the mesoderm, endoderm, non-neural ectoderm, neural crest, notochord, and blood. With further analyses, this high-quality data promises to significantly advance our understanding of how these tissues develop in conjunction with the neural tissue, paving the way for future breakthroughs in developmental biology and genomics.

      Strengths:

      The data is well presented in the figures and thoroughly described in the text. The quality of the scRNAseq data and bioinformatic analysis is exceptional.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      Brooks et al. generate a gene expression atlas of the early embryonic cranial neural plate. They generate single-cell transcriptome data from early cranial neural plate cells at 6 consecutive stages between E7.5 to E9. Utilizing computational analysis they infer temporal gene expression dynamics and spatial gene expression patterns along the anterior-posterior and mediolateral axis of the neural plate. Subsequent comparison with known gene expression patterns revealed a good agreement with their inferred patterns, thus validating their approach. They then focus on Sonic Hedgehog (Shh) signalling, a key morphogen signal, whose activities partition the neural plate into distinct gene expression domains along the mediolateral axis. Single-cell transcriptome analysis of embryos in which the Shh pathway was pharmacologically activated throughout the neural plate revealed characteristic changes in gene expression along the mediolateral axis and the induction of distinct Shh-regulated gene expression programs in the developing fore-, mid-, and hindbrain.

      Strengths:

      This manuscript provides a comprehensive transcriptomic characterisation of the developing cranial neural plate, a part of the embryo that to my knowledge has not been extensively analysed by single-cell transcriptomic approaches. The single-cell sequencing data appears to be of high quality and will be a great resource for the wider scientific community. Moreover, the computational analysis is well executed and the validation of the sequencing data using published gene expression patterns is convincing. Taken together, this is a well-executed study that describes a relevant scientific resource for the wider scientific community.

      Weaknesses:

      Conceptually, the findings that gene expression patterns differ along the rostrocaudal, mediolateral, and temporal axes of the neural plate and that Shh signalling induces distinct target genes along the anterior-posterior axis of the nervous system are more expected than surprising. However, the strength of this manuscript is again the comprehensive characterization of the spatiotemporal gene expression patterns and how they change upon ectopic activation of the Shh pathway.

      Reviewer #3 (Public review):

      Summary:

      The authors performed a detailed single-cell analysis of the early embryonic cranial neural plate with unprecedented temporal resolution between embryonic days 7.5 and 8.75. They employed diffusion analysis to identify genes that correspond to different temporal and spatial locations within the embryo. Finally, they also examined the global response of cranial tissue to a Smoothened agonist.

      Strengths:

      Overall, this is an impressive resource, well-validated against sets of genes with known temporal and spatial patterns of expression. It will be of great value to investigators examining the early stages of neural plate patterning, neural progenitor diversity, and the roles of signaling molecules and gene regulatory networks controlling the regionalization and diversification of the neural plate.

      Weaknesses:

      The manuscript should be considered a resource. Experimental manipulation is limited to the analysis of neural plate cells that were cultured in vitro for 12 hours with SAG. Besides the identification of a significant set of previously unreported genes that are differentially expressed in the cranial neural plate, there is little new biological insight emerging from this study. Some additional analyses might help to highlight novel hypotheses arising from this remarkable resource.

      We thank all three reviewers for their thoughtful and constructive public reviews and believe they nicely capture the contributions of our study. We agree that this article represents a valuable resource for the community and agree with its designation as a Tools and Resources article.

      We also thank the reviewers for their useful suggestions for improving the manuscript. In addition to addressing most of their comments, described below, we note that we have changed midbrain-hindbrain boundary (MHB) to rhombomere 1 (r1) throughout the paper and in Tables S4, S7, S10, and S11, as this designation is more closely aligned with the literature on this region. In addition, we added the anterior-posterior and mediolateral cluster identities from our wild-type analysis for the genes that were differentially expressed in SAG-treated embryos in Table S11. Lastly, we have added a new figure (Figure 5—figure supplement 2), as suggested by Reviewer 2, in which we compare our results with the published expression of genes in neural progenitor domains along the dorsal-ventral axis of the spinal cord.

      Reviewer #1 (Recommendations for the authors):

      I have a few small suggestions for improving the presentation of the data.

      (1) It would be helpful to show illustrations and embryo images of all the stages utilized in the analysis in Figures 1A and B.

      (2) It was difficult to distinguish all the different colors in Figures 3B and 4B. Could you label, as in Figure 4, supplements 1D, F?

      (3) I was confused by the position of the color code key for Figure 7D-J, thinking it belonged to panels B and C. Could you put it under the figure/heatmap key so that it is clearly linked to panels D-J?

      Thank you for these suggestions. We have incorporated the third suggestion to improve readability, but were not able to make the first two changes due to space limitations.

      Reviewer #2 (Recommendations for the authors):

      I only have a couple of minor additional suggestions/questions for the authors:

      (1) The authors state that nearly half of the transcripts they found as differentially regulated in SAG-treated embryos were also characterized as spatially regulated in the wild-type embryos. It would be great if the authors could provide more detail here. How many of the transcripts that are differentially regulated along the mediolateral axis of the wild-type are characterized as differentially regulated in the SAG-treated embryos? How does this further break down into where these genes are expressed along the mediolateral and the anterior-posterior axes? I am aware that the authors answer some of these questions already by providing examples, but a more systematic characterisation would be appreciated here.

      We have updated Table S11 to include the anterior-posterior and mediolateral cluster identities of differentially expressed genes in SAG-treated embryos, where applicable. In addition, we have added more discussion of the genes from our SAG analysis that were also found to be spatially patterned in wild-type embryos to the fourth paragraph of the last results section.

      (2) Related to the previous question, the authors nicely demonstrate that SAG treatment of embryos causes many transcriptional changes, including the expression/repression of several transcription factors well-known to mediate spatial patterning, raising the question of which of these effects are directly due to gene regulation by the Shh pathway and which effects are secondary consequences of transcriptional changes of other transcription factors. Similarly, the authors' results also suggest that some genes are only induced in specific parts along the neuraxis, raising the question of why. The authors could attempt some type of regulon-interference approaches to identify further candidates that may mediate these effects.

      This is an excellent suggestion for a future extension of this work, as we agree that validation of the predicted SHH targets, including which targets are direct, indirect, or region-specific, would be required to evaluate the predictions of this scRNA-seq analysis.

      (3) The authors report that they observed 'a previously unreported inhibition of Scube2' upon SAG treatment of the embryos. At least in the spinal cord Scube2 is well-known to be expressed at a distance from the source of Shh secretion (e.g. Kawakami et al. Curr. Biol. 2005), thus the direct or indirect repression by Shh signalling is strongly expected. Moreover, a recent preprint (Collins et al. bioRxiv, https://doi.org/10.1101/469239 ) suggests that the interaction between Shh and Scube2 can mediate the scale-invariance of Shh patterning. Of note, the authors of this preprint also state that 'upregulation of Shh represses scube2 expression while Shh downregulation increases scube2 expression thus establishing a negative feedback loop.'

      Thank you for this suggestion. We have added these references.

      (4) The authors partition genes based on different diffusion components as being differentially expressed along the mediolateral axis. However, starting from ~e8.5, neural progenitors in the neural tube can be partitioned based on the expression of well-characterised combinatorial sets of transcription factors into molecularly defined progenitor domains that subsequently give rise to functionally distinct types of neurons. How much of this patterning process can the authors capture with their diffusion component analysis and does their data also allow them to capture these finer-grained differences in gene expression along the mediolateral and prospective dorsal-ventral axis of the neural tube that are known to exist?

      This is a very interesting point. We have added a new figure showing UMAPs of the E8.5-9.0 cranial neural plate for a subset of 29 genes (described in Delile et al., 2019) that define distinct neural progenitor domains along the dorsal-ventral axis of the spinal cord (Figure 5—figure supplement 2). We observed that 18 of 20 genes that were detected in the midbrain/r1 region in our dataset were expressed in broad domains along the mediolateral axis of the cranial neural plate that were roughly consistent with their expression domains along the dorsal-ventral axis of the spinal cord. Of these 18 genes, 14 were patterned along both anterior-posterior and mediolateral axes, 2 were patterned only along the mediolateral axis, and 2 were patterned only along the anterior-posterior axis. These results suggest a general correspondence between mediolateral patterning in the cranial neural plate and dorsal-ventral patterning in the spinal cord. However, less refinement of these domains along the mediolateral axis was observed in the cranial neural plate, possibly because the relatively early, pre-closure stages captured by our dataset may be before the establishment of secondary feedback systems that lead to fine-scale patterning of mutually exclusive neural precursor domains. These results are described in the last paragraph of the results section titled “An integrated framework for analyzing cell identity in multiscale space.”

      (5) The authors state that they will not only make the raw sequencing data but also the processed intermediate data files available. This is greatly appreciated as it strongly facilitates the re-use of the data. However, it would be also appreciated if the authors made the computational code publicly available that was used to analyze the data and generate the figure panels in the manuscript.

      We have deposited the processed h5ad files in the GEO database, accession number GSE273804. Additionally, we have made interactive python notebooks available with the code used to analyze gene expression and generate the figures in this study, as well as code used to automatically generate customizable links to gene expression images in the Mouse Genome Informatics Gene Expression database, on our lab GitHub page (https://github.com/ZallenLab). We have updated the Data availability section to reflect these changes.

      Reviewer #3 (Recommendations for the authors):

      (1) Considering that individual progenitor domains in the developing neural tube are typically sharply delineated with few cells exhibiting mixed identities, it is interesting that clustering of single-cell data results in a largely continuous “cloud” of cells. Is this because the early neural plate cells have not yet crystallized their identity, or would clustering based on a smaller set of genes that exhibit high variance across only neural plate cells result in improved granularity, allowing for better characterization and quantification of distinct progenitor subtypes?

      Thank you for raising this interesting point. The apparent continuity of gene expression in the cranial neural plate could reflect a gene signature shared by cranial neural plate cells and that cells may not be extensively regionalized into unique populations at these early stages. We now discuss these possibilities in the third paragraph of the discussion.

      (2) Can the authors clarify how neural plate cells were identified and how they were distinguished from the anterior epiblast?

      Cell typing was performed by supervised clustering based on known markers of fate. Cranial neural plate cells were identified by their expression of pan-neural factors (Sox2 and Sox3), early or late neural plate markers (Cdh1 or Cdh2), and the lack of markers associated with non-neural ectodermal cell fates (Grhl2, Krt18, Tfap2a) or other cell types (Ets1, T, Tbx6). Full gene sets used to identify all cell types in our analysis are provided in Supplementary Table 13.

      (3) Did the study identify cells with cranial placode identity? Cranial placodes emerge during the same period, and it would be useful to highlight them in Figure 1.

      Thank you for highlighting this point. Examination of the early placode markers Six1 and Eya1 indicates that cranial placode cells are a subset of the cells in PhenoGraph cluster 17 in our full dataset Figure 1—figure supplement 1). We now mention this along with other cell types of interest in the last paragraph of the discussion.

      (4) It could be interesting to provide more information about the novel genes identified as differentially expressed along the AP or mediolateral axes. Do they belong to gene families that were not previously implicated in neural patterning, or do they point to novel biological mechanisms controlling neural patterning?

      Diverse gene families are represented by the genes that are patterned along the anterior-posterior and mediolateral axes of the cranial neural plate at these stages, likely due to the large number of genes that are spatially patterned in this tissue. Further investigation of the biological mechanisms suggested by these patterns is an important direction for future work, both in terms of molecularly classifying the genes identified as well as directly investigating their roles in neural patterning using genetic analysis.

      (5) It would be helpful to discuss how the data presented here compare to other relevant single-cell analyses, such as PMC10901739. This would help to highlight aspects that are unique to this study.

      We have added this reference as well as an earlier study from these authors and we discuss how our study complements this work in the introduction.

      (6) The inclusion of single-cell data from control embryos that were cultured for 12 hours is of great interest. The authors should identify the set of genes that are deregulated in cultured cells and, taking advantage of their detailed temporal series, examine whether the maturation of cultured embryos progresses normally or whether there are genes that fail to mature correctly in vitro.

      We agree that an analysis of the impact of ex vivo culture on gene expression would be useful. However, the large difference in the number of cells in our wild-type and cultured embryo datasets, as well as the lack of time-course data for the cultured embryos, could make a comparison between our current cultured and non-cultured embryo datasets difficult to interpret.

    1. eLife Assessment

      This fundamental work demonstrates the importance of considering overlapping modes of functional organization (i.e. gradients) in the hippocampus, showing associations with aging, dopaminergic receptor distribution and episodic memory. The evidence supporting the conclusions is convincing, although not all analyses were performed in a replication sample. The work will be of broad interest to basic and clinical neuroscientists.

    2. Reviewer #2 (Public review):

      Summary:

      This paper derives the first three functional gradients in the left and right hippocampus across two datasets. These gradient maps are then compared to dopamine receptor maps obtained with PET, associated with age, and linked to memory. Results reveal links between dopamine maps and gradient 2, age with gradients 1 and 2, and memory performance.

      Strengths:

      This paper investigates how hippocampal gradients relate to aging, memory, and dopamine receptors, which are interesting and important questions. A strength of the paper is that some of the findings were replicated in a separate sample.

      Assessment after revision:

      The authors addressed concerns about unclear multiple comparison correction in the revision. The replication sample was primarily used to replicate the topographic organization of functional hippocampal-neocortical connectivity within the hippocampus across the adult lifespan, which was the central goal of this paper. Not all other analyses replicated, which the authors nicely clarified in the revised manuscript. Overall, this work is a thorough and valuable contribution to the literature.

    3. Reviewer #3 (Public review):

      Summary:

      In this study, the authors analyzed the complex functional organization of the hippocampus using two separate adult lifespan datasets. They investigated how individual variations in the detailed connectivity patterns within the hippocampus relate to behavioral and molecular traits. The findings confirm three overlapping hippocampal gradients and reveal that each is linked to established functional patterns in the cortex, the arrangement of dopamine receptors within the hippocampus, and differences in memory abilities among individuals. By employing multivariate data analysis techniques, they identified older adults who display a hippocampal gradient pattern resembling that of younger individuals and exhibit better memory performance compared to their age-matched peers. This underscores the behavioral importance of maintaining a specific functional organization within the hippocampus as people age.

      Strengths:

      The evidence supporting the conclusions is compelling, based on a unique dataset, a rich set of carefully unpacked results, and a rigorous data analysis that is clearly explained and motivated. Possible confounds are carefully considered and ruled out.

      Assessment after revision:

      The authors improved the transparency of the statistical analyses by stating explicitly what tests and corrections were performed and clearly justifying the elected statistical approaches. They now also acknowledge and discuss the potential limitations of the presented PET analyses. Overall this is a rigorous and important contribution to the literature that will likely be of broad interest to basic and clinical neuroscience.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors studied how hippocampal connectivity gradients across the lifespan, and how these relate to memory function and neurotransmitter distributions. They observed older age with less distinct transitions and observed an association between gradient de-differentiation and cognitive decline.

      This is overall an innovative and interesting study to assess gradient alterations across the lifespan and its associations to cognition.

      The paper is well-written, and the methods appear sound and thoughtful. There are several strengths, including the inclusion of two independent cohorts, the use of gradient mapping and alignment techniques, and an overall sound statistical and analysis framework. There are several areas for potential improvements in the paper, and these are listed below:

      We thank the Reviewer for their positive assessment and summary of our work. We address each of the Reviewer’s comments below, and outline the revisions we have made to the manuscript based on the Reviewer’s suggestions.

      (1) The reported D1 associations appear a bit post-hoc in the current work and I was unclear why the authors specifically focussed on dopamine here, as other transmitter systems are similar present at the level of the hippocampus and implicated in aging.

      Other neurotransmitter systems may indeed be relevant in the context of hippocampal function in aging. In this study, however, we included a specific research question about the DA D1 receptor (D1DR) based on previous research 1) emphasizing the role of DA neuromodulation in maintaining functional network segregation in aging to support cognition (Pedersen et al., 2023), 2) reporting heterogeneous distribution of DA markers across the hippocampus, supporting efficient modulation of distinct behaviors (Dubovyk & ManahanVaughan, 2019; Edelmann & Lessmann, 2018; Gasbarri et al., 1994; Kempadoo et al., 2016), and 3) demonstrating the spatial distribution of D1DRs as varying across neocortex along a unimodal-transmodal gradient (Pedersen et al., 2024). To which degree this variation might be reflected in cortico-hippocampal connectivity, however, remained to be investigated. As such, one of the study’s specific aims was to evaluate the spatial distribution of D1DRs as a molecular correlate of the hippocampus’ functional organization. Importantly, we were interested in mapping associations between individual differences in the organization of connectivity and D1DRs. This was uniquely enabled by utilizing the DyNAMiC sample, as it includes structural and functional MRI data in combination with D1DR PET in the same individuals across the adult lifespan (n=180). However, after observing significant spatial correspondence between functional organization and D1DR expressed by the second hippocampal gradient (G2), we did indeed perform complimentary analyses with group-averaged data of additional dopamine markers (D2DR from a subsample of our participants, as well as DAT and FDOPA from open sources) to test the generalizability of the original finding. Taken together, the original analyses based on subject-level data and complimentary group-level analyses provided support for the interpretation of G2 as a dopaminergic mode.

      We have updated the manuscript to clarify the focus on the D1 receptor and the contribution of including additional DA markers.

      Updated paragraph in the Introduction, pages 5-6:

      “Dopamine (DA) is one of the most important modulators of hippocampus-dependent function(47,48), and influences the brain’s functional architecture through enhancing specificity of neuronal signaling(49). Consistently, there is a DA-dependent aspect of maintained functional network segregation in aging which supports cognition(50). Animal models suggest heterogeneous patterns of DA innervation(51,52) and postsynaptic DA receptors(53), across both transverse and longitudinal hippocampal axes, likely allowing for separation between DA modulation of distinct hippocampus-dependent behaviors(47). Moreover, the human hippocampus has been linked to distinct DA circuits on the basis of long-axis variation in functional connectivity with midbrain and striatal regions(54,55). Taken together with recent findings revealing a unimodal-transmodal organization of the most abundantly expressed DA receptor subtype, D1 (D1DR), across cortex(56), we tested the hypothesis that the organization of hippocampal-neocortical connectivity partly reflects the underlying distribution of hippocampal DA receptors, predicting predominant spatial correspondence for any hippocampal gradient conveying a unimodal-transmodal pattern across cortex.”

      Updated sections in the Results, page 13-14:

      “Our next aim was to investigate to which extent the distribution of hippocampal DA D1 receptors (D1DRs), measured by [<sup>11</sup>C]SCH23390 PET in the DyNAMiC(58) sample, may serve as a molecular correlate of the hippocampus’ functional organization.”

      “Complimentary analyses were then conducted to further evaluate G2 as a dopaminergic hippocampal mode by utilizing additional DA markers at group-level.”

      Moreover, the authors may be aware that multiple PET tracers are somewhat challenged in the mesiotemporal region. Is this the case for the D1 receptor as well? The hippocampus is a small and complex structure, and PET more of a low res technique so one would want to highlight and discuss the limitations of the correlations with PET maps here and/or evaluate whether the analysis adds necessary findings to the study.

      We thank the Reviewer for raising this point. The lower resolution of PET is indeed a relevant aspect to consider when quantifying D1DR availability in the hippocampus, even though previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET measurement in this region (Kaller et al., 2017). We have now elaborated on PET limitations in the Discussion of the revised manuscript.

      In our study, we made efforts to reduce potential partial volume effects (PVE) by correcting our PET data, and tested spatial associations between our functional gradients and D1DR maps using trend-surface modelling (TSM), rather than through voxel-wise comparisons. This allowed us to evaluate the spatial correspondence between functional connectivity and D1DRs at a level of spatial trends, estimated using TSM models computed at increasing levels of complexity. The results showed consistent spatial overlap between G2 and D1DRs across these models, that is, across spatial trends described at coarser-to-finer scales. Furthermore, this was replicated across several DA markers with PET and SPECT data from independent samples.

      Taken together, we agree with the Reviewer that the spatial correspondence observed between G2 and hippocampal D1DRs should be interpreted in the context of resolution-related limitations inherent to PET imaging. However, we strongly believe that our DA analyses offer valuable insight to the molecular underpinnings of hippocampal functional organization.

      Updated paragraph in the Discussion, pages 25-26:

      “We discovered that G2, specifically, manifested organizational principles shared among function, behavior, and neuromodulation. Meta-analytical decoding reproduced a unimodalassociative axis across G2 (Figure 3B), and analyses in relation to the distribution of D1DRs – which vary across cortex along a unimodal-transmodal axis(76,77) – demonstrated topographic correspondence both at the level of individual differences and across the group. It should, however, be acknowledged that PET imaging in the hippocampus is associated with resolutionrelated limitations, although previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET to quantify D1DR availability in this region(78). As such, mapping the distribution of hippocampal D1DRs at a fine spatial scale remains challenging, and replication of our results in terms of overlap with G2 is needed in independent samples. Here, we evaluated the observed spatial overlap between G2 topography and D1DRs across multiple TSM model orders, showing correspondence between modalities from simple to more complex parameterizations of their spatial properties. Topographic correspondence was additionally observed between G2 and other DA markers from independent datasets (Figure 3B), suggesting that G2 may constitute a mode reflecting a dopaminergic phenotype, which contributes to the currently limited understanding of its biological underpinnings.”

      From my (perhaps somewhat biased) perspective, it might be valuable to instead or in addition look at measures of hippocampal microstructure and how these relate to the functional aging effects. This could be done, if available, using data from the same subjects (eg based on quantitative MRI contrasts and/or structural MRI) and/or using contextualization findings as implemented in eg hippomaps.readthedocs.io

      We thank the Reviewer for this suggestion. We performed additional analyses investigating the spatial overlap between our connectivity gradients and estimates of hippocampal microstructure, computed as the ratio of T1- over T2-weighted (T1w/T2w) images (Glasser & Von Essen, 2011; vos de Wael et al., 2018). Analyses of spatial correspondence then followed the TSM-based method used to test the spatial overlap between functional connectivity gradients and D1DR distribution. Applying TSM to the T1w/T2w image computed for each participant yielded subject-level model parameters describing microstructure topography, which were then entered as predictors of connectivity topography in multivariate GLMs (separate models for each gradient and hemisphere, 6 models in total).

      Analyses revealed that microstructure of the right hippocampus significantly predicted gradient topography of right-hemisphere G1 (F = 1.325, p \= 0.034), while no other links between connectivity gradients and microstructure emerged as significant (F 0.930-1.184, ps 0.7060.079).

      These results, suggesting an association along the anteroposterior axis, deviate from previous findings linking hippocampal microstructure to G3-like, medial-lateral, connectivity organization (vos de Wael et al., 2018). As we believe that comprehensive analyses of our gradients in relation to microstructure across the lifespan would be best addressed in future work, we have not included these analyses of microstructure in the revised manuscript.

      (2) Can the authors clarify why they did not replicate based on cohorts that are more widely used in the community and open access, such as CamCAN and/or HCP-Aging? It might connect their results with other studies if an attempt was made to also show that findings persist in either of these repositories.

      We agree with the Reviewer that replication in samples such as CamCAN and/or HCP-Aging would provide valuable opportunities to connect our findings with those of other studies using those datasets. Here, we included the Betula dataset (Nilsson et al., 2004) as our replication sample, as it was immediately available to us, included a large sample of adults in a comparable age, and a word recall episodic memory task closely aligned with the one included in DyNAMiC. Importantly, leveraging the Betula dataset as our replication sample allows us to link our findings to a wide range of previous studies central to the understanding of neurocognitive aging in general, and hippocampal aging in particular (Nyberg, 2017; Nyberg et al., 2020). Betula is a large longitudinal project that has been tracking individuals since 1988, and is part of the National E-infrastructure for Aging Research (NEAR: www.near-aging.se), through which data from several Swedish studies are made available to both national and international researchers. While we acknowledge the value of extending replication efforts to datasets like CamCAN and HCP-Aging, we emphasize the significant contribution of having replicated our connectivity gradients in the Betula dataset.

      (3) The authors applied TSM and related these parameters to topographic changes in the gradients. I was wondering whether and how such an approach controls for autocorrelation present in both the PET map and gradients. Could the authors clarify?

      The Reviewer raises an important topic in spatial autocorrelation. The TSM approach used to parameterize the topography of the functional gradients and D1DR distribution, and to test the spatial correspondence between modalities, did not include any specific method to control for autocorrelation. Here, we highlight two aspects of our study in relation to this point. First, we demonstrated in the Supplementary information (S. Figure 4) that autocorrelation induced by spatial smoothing likely has limited effects on overall gradient topography and the ability of TSM parameters to capture meaningful inter-individual differences in terms of age. Second, in the case of spatial overlap effects being significantly impacted by autocorrelation, we would expect the association between right-hemisphere G2 and D1DR topography to similarly emerge for G2 in the left hemisphere. The absence of such an association may speak to a limited effect of spatial autocorrelation.

      (4) The TSM approach quantifies the gradients in terms of x/y/z direction in a cartesian coordinate system. Wouldn't a shape intrinsic coordinate system in the hippocampus also be interesting, and perhaps even be more efficient to look at here (see eg DeKraker 2022 eLife or Paquola et al 2020 eLife)?

      This is a very relevant question and we appreciate the Reviewer’s suggestion. We recognize that there may be several benefits associated with adopting a shape-intrinsic coordinate system when characterizing effects in the hippocampus, given its curved/folded anatomy. Approaches like the ones adopted in DeKraker et al., 2022 and Paquola et al., 2020, utilizes geodesic coordinate frameworks to represent the hippocampus in surface space, enabling mapping of connectivity onto the hippocampal surface while respecting its inherent curvature and topology. We anticipate that quantifying gradients within such a framework would especially benefit identification of connectivity change across the hippocampal surface relative to reference points such as subfield boundaries, while minimizing effects of interindividual differences in hippocampal shape and folding. In our study, hippocampal gradients and their associated cortical patterns were computed in volumetric space, with TSM subsequently used to parameterize the change in connectivity along these gradients. This indeed yields a description of connectivity change within a coordinate system less specific to hippocampal anatomy, but may favor generalizability and integration with previous gradient findings within and beyond the hippocampus (e.g., Przeździk et al., 2019; Tian et al., 2020; Katsumi et al., 2023; Navarro-Schröder et al., 2015), as well as connections with broader neuroimaging frameworks through techniques such as meta-analytical decoding. In our view, the different coordinate frameworks offer complimentary insight to hippocampal organization, and while we have opted to not undertake novel analyses to explore our gradients within a geodesic coordinate system for the purposes of this paper, we recognize the importance of such evaluation of our gradients in future analyses. We have made updates to the Discussion in the revised manuscript on this topic (pages 23-24):

      “Greater anatomical specificity, with more precise characterization of connectivity in relation to subfield boundaries while minimizing effects of inter-individual differences in hippocampal shape and folding, might be achieved by adopting techniques implementing a geodesic coordinate system to represent effects within the hippocampus(68,69).”

      Reviewer #2 (Public Review):

      Summary:

      This paper derives the first three functional gradients in the left and right hippocampus across two datasets. These gradient maps are then compared to dopamine receptor maps obtained with PET, associated with age, and linked to memory. Results reveal links between dopamine maps and gradient 2, age with gradients 1 and 2, and memory performance.

      Strengths:

      This paper investigates how hippocampal gradients relate to aging, memory, and dopamine receptors, which are interesting and important questions. A strength of the paper is that some of the findings were replicated in a separate sample.

      Weaknesses:

      The paper would benefit from added clarification on the number of models/comparisons for each test. Furthermore, it would be helpful to clarify whether or not multiple comparison correction was performed and - if so - what type or - if not - to provide a justification. The manuscript would furthermore benefit from code sharing and clarifying which results did/did not replicate.

      We thank the Reviewer for their positive assessment and suggestions regarding further clarifications. We have addressed the Reviewer’s comments in a point-by-point manner under the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors analyzed the complex functional organization of the hippocampus using two separate adult lifespan datasets. They investigated how individual variations in the detailed connectivity patterns within the hippocampus relate to behavioral and molecular traits. The findings confirm three overlapping hippocampal gradients and reveal that each is linked to established functional patterns in the cortex, the arrangement of dopamine receptors within the hippocampus, and differences in memory abilities among individuals. By employing multivariate data analysis techniques, they identified older adults who display a hippocampal gradient pattern resembling that of younger individuals and exhibit better memory performance compared to their age-matched peers. This underscores the behavioral importance of maintaining a specific functional organization within the hippocampus as people age.

      Strengths:

      The evidence supporting the conclusions is overall compelling, based on a unique dataset, rich set of carefully unpacked results, and an in-depth data analysis. Possible confounds are carefully considered and ruled out.

      Weaknesses:

      No major weaknesses. The transparency of the statistical analyses could be improved by explicitly (1) stating what tests and corrections (if any) were performed, and (2) justifying the elected statistical approaches. Further, some of the findings related to the DA markers are borderline statistically significant and therefore perhaps less compelling but they line up nicely with results obtained using experimental animals and I expect the small effect sizes to be largely related to the quality and specificity of the PET data rather than the derived functional connectivity gradients.

      We thank the Reviewer for the thoughtful summary and positive assessment of our work. To increase transparency of the statistical analyses, we have in the revised manuscript added information regarding statistical tests and corrections for multiple comparisons. In the Results, p-values were reported at an uncorrected statistical threshold, and we have in the revised manuscript included the corresponding p-values adjusted for multiple comparisons using the Benjamini-Hochberg method to control the false discovery rate (FDR). Finally, in the revised manuscript, we have now elaborated on the potential limitations of our PET analyses and we include the updated paragraph below.

      Addition made to the Results section, page 13:

      “Individual maps of D1DR binding potential (BP) were also submitted to TSM, yielding a set of spatial model parameters describing the topographic characteristics of hippocampal D1DR distribution for each participant. D1DR parameters were subsequently used as predictors of gradient parameters in one multivariate GLM per gradient (in total 6 GLMs, controlled for age, sex, and mean FD). Results are reported with p-values at an uncorrected statistical threshold and p-values after adjustment for multiple comparisons using the Benjamini-Hochberg method to control the false discovery rate (FDR).”

      Addition made to the Results section, page 15:

      “Effects of age on gradient topography were assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD). One model was fitted per gradient and hemisphere, each model including all TSM parameters belonging to a gradient (in total, 6 GLMs).”

      Addition made to the Results section, page 17:

      “Models were assessed separately for left and right hemispheres, across the full sample and within age groups, yielding eight hierarchical models in total. Results are reported with p-values at an uncorrected statistical threshold and p-values after FDR adjustment.”

      Updated paragraph in the Discussion, pages 25-26:

      “We discovered that G2, specifically, manifested organizational principles shared among function, behavior, and neuromodulation. Meta-analytical decoding reproduced a unimodalassociative axis across G2 (Figure 3B), and analyses in relation to the distribution of D1DRs – which vary across cortex along a unimodal-transmodal axis(76,77) – demonstrated topographic correspondence both at the level of individual differences and across the group. It should, however, be acknowledged that PET imaging in the hippocampus is associated with resolutionrelated limitations, although previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET to quantify D1DR availability in this region(78). As such, mapping the distribution of hippocampal D1DRs at a fine spatial scale remains challenging, and replication of our results in terms of overlap with G2 is needed in independent samples. Here, we evaluated the observed spatial overlap between G2 topography and D1DRs across multiple TSM model orders, showing correspondence between modalities from simple to more complex parameterizations of their spatial properties. Topographic correspondence was additionally observed between G2 and other DA markers from independent datasets (Figure 3B), suggesting that G2 may constitute a mode reflecting a dopaminergic phenotype, which contributes to the currently limited understanding of its biological underpinnings.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please see the comments in the public review.

      We thank the Reviewer for their comments and recommendations, and have addressed them in the “Public review” section.

      Reviewer #2 (Recommendations For The Authors):

      (1) All statistical analyses are based on linear regressions using trend surface modeling (TSM) parameters that parameterize gradients at the subject level. These models resulted in 9 parameters for gradient 1 and 12 parameters each for gradients 2 and 3. The text states that 'Effects of age on gradient topography was assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD)'. Please clarify whether these GLMs were fitted separately for each TSM parameter (i.e., 9+12+12=33 models for both left and right = 66 total models) or on the overall model?

      We appreciate the Reviewer’s request for clarification on this matter. These GLMs were fitted on the overall TSM model, that is, through one GLM per gradient (3) and hemisphere (2), each one including all TSM parameters belonging to a gradient (in total, 6 GLMs).

      In the revised manuscript, we have added more details to the Results section, page 15: “Effects of age on gradient topography were assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD). One model was fitted per gradient and hemisphere, each model including all TSM parameters belonging to a gradient (in total, 6 GLMs).”

      (2) Similarly, for memory it appears that multiple models were performed (left and right, young, middle-aged, old, whole groups). Please clarify whether and how multiple comparison correction was performed in this case.

      In the revised manuscript, we have now specified the number of analyses conducted in relation to memory performance. We have also clarified that p-values were reported at an uncorrected statistical threshold, and we have in the revised manuscript included the corresponding p-values adjusted for multiple comparisons using the Benjamini-Hochberg method to control the FDR.

      Updated section in the Results, page 17:

      “Models were assessed separately for left and right hemispheres, across the full sample and within age groups, yielding eight hierarchical models in total. Results are reported with p-values at an uncorrected statistical threshold and p-values after FDR adjustment.”

      (3) Although I applaud the authors for their replication efforts, the results do not appear to replicate well. For example, memory was linked to gradient 2 in the whole group but to gradient 1 in the young group. Furthermore, dopamine was linked to gradient 2 in the right but not the left hemisphere. Although the overall group-level gradients were very stable between the two datasets, it is not clear whether the age findings replicated and the memory subgroup findings only replicated at trend level for memory and only partially replicated at the TSM parameter level.

      We thank the Reviewer for highlighting the inclusion of a replication dataset as a strength of our study, and we appreciate the recommendation to clarify to which extent results replicated. We provide a response to the Reviewer’s points below, and specify the revisions made to the manuscript in relation to this topic.

      The main aim of our study was to characterize the topographic organization of functional hippocampal-neocortical connectivity within the hippocampus across the adult lifespan, as previous studies have limited their focus to younger adults. Given the lack of previous studies for comparison, together with our identification of a novel secondary long-axis connectivity gradient (G2) taking precedence over the previously established medial-lateral G3, we included the Betula sample (Nilsson et al., 2004) for the purpose of replication. There was a high level of consistency between our main dataset and our replication dataset, with gradients 1-3 in left and right hemispheres identified in both samples.

      Further use of the replication dataset, beyond the identification of the connectivity gradients, was originally not planned. As such, not all subsequent analyses in the main dataset were conducted in the replication dataset. However, we found it critical to evaluate the observation that older individuals who maintained a youth-like gradient topography also exhibited higher levels of memory performance in an independent sample. This was possible given that the replication dataset included a comparable number of participants in similar ages and a word recall episodic memory task corresponding well to the one used in DyNAMiC. Overall, we conclude that these analyses replicated well across samples. Firstly, topography of lefthemisphere G1 informed the classification of older adults into youth-like and aged subgroups in both samples. Furthermore, in both samples, we observed that the older subgroups identified based on G1 topography also exhibited the youth-like vs. aged pattern in G2 topography. This pattern was, however, evident also in G3 only in the main sample, possibly suggesting a limited contribution of G3 topography in determining overall functional profiles in older age. In terms of the behavioral relevance of maintaining youth-like gradient topography in older age, we observed effects on word recall performance in both samples; although the Reviewer correctly points out that, the difference between subgroups was significant at trend-level (p = 0.058) in the replication dataset. While this indeed underscores the importance of replication efforts in additional samples, we argue that the pattern observed in our replication dataset is overall consistent with, and conveys effects in the expected direction based on, the original observations in our main dataset.

      In revising the manuscript, we have performed additional analyses for replication purposes in terms of memory. Originally, we observed a significant association between G2 topography and episodic memory across the main sample. However, this effect did not remain significant after FDR adjustment for multiple comparisons. To evaluate this association further, we conducted a corresponding hierarchical multiple regression analysis in the replication dataset, which supported a role of G2 in memory (Adj. R<sup>2</sup> = 0.368, ΔR<sup>2</sup> = 0.081, F= 1.992, p = 0.028). Together, these analyses suggest that inter-individual differences in episodic memory performance may in part be explained by the spatial characteristics of G2 across the adult lifespan, although increased statistical power in relation to the large number of TSM parameters included in the hierarchical regression models may be needed to explore this association in smaller, age-stratified, groups. Relatedly, it is worth mentioning that higher levels of memory performance in older age were linked to the maintenance of youth-like G2 topography in both our main and replication datasets.

      In parallel, topographic parameters of G1 predicted memory performance in the younger adults, which successfully replicates TSM-based results previously reported in Przeździk et al., 2019. Although similar associations were not evident within the other age groups, a link between G1 topography and memory was demonstrated in older age based on a) the identification of individuals maintaining a youth-like G1 profile and higher levels of memory, within which b) memory performance was, as in young adults, significantly predicted by G1 topography.

      The spatial correspondence between G2 topography and distribution of hippocampal D1DRs was lateralized to the right, and as the Reviewer points out, as such did not replicate across hemispheres. To which extent replication across hemispheres should be expected in this case is, however, difficult to determine. Lateralization and/or hemispheric asymmetry is commonly observed in numerous hippocampal features, from the molecular level to its functional involvement in behavior (Nematis et al., 2023; Persson & Söderlund, 2015), including various dopaminergic markers tested in the animal literature (Afonso et al., 1993; Sadeghi et al., 2017). Yet, potential differences between hemispheres in D1DR availability and the spatial distribution of receptors along hippocampal axes remain less studied in humans. More data is therefore needed to determine the nature of this right-hemisphere lateralization.

      In sum, we argue that our results show a good level of replication across independent datasets and across analyses in our main dataset. Whereas this study did not attempt replication of all analyses conducted in the main dataset, it has through replication across independent samples provided support for its main findings – the organization of hippocampal-neocortical connectivity along three main hippocampal gradients across the adult lifespan, and the gradient topography-based identification of older individuals maintaining a youth-like hippocampal organization in older age.

      The revised manuscript includes edits made to incorporate the new analyses and clarifications of observations in relation to memory.

      In the Results, page 17:

      “Observing that the association between G2 and memory did not remain significant after FDR adjustment, we performed the same analysis in our replication dataset, which also included episodic memory testing. Consistent with the observation in our main dataset, G2 significantly predicted memory performance (Adj. R<sup>2</sup> = 0.368, ΔR<sup>2</sup> = 0.081, F= 1.992, p = 0.028) over and above covariates and topography of G1. Here, the analysis also showed that G1 topography predicted performance across the sample (Adj. R<sup>2</sup> = 0.325, ΔR<sup>2</sup> = 0.112, F= 3.431, p < 0.001).”

      In the Discussion, page 26:

      “Results linked both G1 and G2 to episodic memory, suggesting complimentary contributions of these two overlapping long-axis modes. Considered together, analyses in the main and replication datasets indicated a role of G2 topography in memory across the adult lifespan, independent of age. A similar association with G1 was only evident across the entire sample in the replication dataset, whereas results in the main sample seemed to emphasize a role of youthlike G1 topography in memory performance. In line with previous research, memory was successfully predicted by G1 topography in young adults(30), and similarly predicted by G1 in older adults exhibiting a youth-like functional profile.”

      (4) Please share the data and code and add a description of data and code availability in the manuscript.

      We have now made our code available, and added a statement on data and code availability in the revised manuscript.

      On page 37: “Data from the DyNAMiC study are not publicly available. Access to the original data may be shared upon request from the Principal investigator, Dr. Alireza Salami. The Matlab, R, and FSL codes used for analyses included in this study are openly available at https://github.com/kristinnordin/hcgradients. Computation of gradients was done using the freely available toolbox ConGrads: https://github.com/koenhaak/congrads.”

      Reviewer #3 (Recommendations For The Authors):

      Please see the comments in the public review.

      We thank the Reviewer for their comments and recommendations, and have addressed them in the “Public review” section.

      References

      Afonso, D., Santana, C., & Rodriguez, M. (1993). Neonatal lateralization of behavior and brain dopaminergic asymmetry. Brain Research Bulletin, 32(1), 11–16. https://doi.org/10.1016/0361-9230(93)90312-Y

      DeKraker, J., Haast, R. A., Yousif, M. D., Karat, B., Lau, J. C., Köhler, S., & Khan, A. R. (2022). Automated hippocampal unfolding for morphometry and subfield segmentation with HippUnfold. eLife, 11, e77945. https://doi.org/10.7554/eLife.77945

      Dubovyk, V., & Manahan-Vaughan, D. (2019). Gradient of expression of dopamine D2 receptors along the dorso-ventral axis of the hippocampus. Frontiers in Synaptic Neuroscience, 11. https://doi.org/10.3389/fnsyn.2019.00028

      Edelmann, E., & Lessmann, V. (2018). Dopaminergic innervation and modulation of hippocampal networks. Cell and Tissue Research, 373(3), 711–727. https://doi.org/10.1007/s00441-018-2800-7

      Gasbarri, A., Verney, C., Innocenzi, R., Campana, E., & Pacitti, C. (1994). Mesolimbic dopaminergic neurons innervating the hippocampal formation in the rat: A combined retrograde tracing and immunohistochemical study. Brain Research, 668(1), 71–79. https://doi.org/10.1016/0006-8993(94)90512-6

      Glasser, M. F., & Essen, D. C. V. (2011). Mapping Human Cortical Areas In Vivo Based on Myelin Content as Revealed by T1- and T2-Weighted MRI. Journal of Neuroscience, 31(32), 11597–11616. https://doi.org/10.1523/JNEUROSCI.2180-11.2011

      Kaller, S., Rullmann, M., Patt, M., Becker, G.-A., Luthardt, J., Girbardt, J., Meyer, P. M., Werner, P., Barthel, H., Bresch, A., Fritz, T. H., Hesse, S., & Sabri, O. (2017). Test– retest measurements of dopamine D1-type receptors using simultaneous PET/MRI imaging. European Journal of Nuclear Medicine and Molecular Imaging, 44(6), 1025–1032. https://doi.org/10.1007/s00259-017-3645-0

      Katsumi, Y., Zhang, J., Chen, D., Kamona, N., Bunce, J. G., Hutchinson, J. B., Yarossi, M., Tunik, E., Dickerson, B. C., Quigley, K. S., & Barrett, L. F. (2023). Correspondence of functional connectivity gradients across human isocortex, cerebellum, and hippocampus. Communications Biology, 6(1), Article 1. https://doi.org/10.1038/s42003-023-04796-0

      Kempadoo, K. A., Mosharov, E. V., Choi, S. J., Sulzer, D., & Kandel, E. R. (2016). Dopamine release from the locus coeruleus to the dorsal hippocampus promotes spatial learning and memory. Proceedings of the National Academy of Sciences, 113(51), 14835–14840. https://doi.org/10.1073/pnas.1616515114

      Navarro Schröder, T., Haak, K. V., Zaragoza Jimenez, N. I., Beckmann, C. F., & Doeller, C. F. (2015). Functional topography of the human entorhinal cortex. eLife, 4, e06738. https://doi.org/10.7554/eLife.06738

      Nemati, S. S., Sadeghi, L., Dehghan, G., & Sheibani, N. (2023). Lateralization of the hippocampus: A review of molecular, functional, and physiological properties in health and disease. Behavioural Brain Research, 454, 114657. https://doi.org/10.1016/j.bbr.2023.114657

      Nilsson, L.-G., Adolfsson, R., Bäckman, L., Frias, C. M. de, Molander, B., & Nyberg, L. (2004). Betula: A Prospective Cohort Study on Memory, Health and Aging. Aging, Neuropsychology, and Cognition, 11(2–3), 134–148. https://doi.org/10.1080/13825580490511026

      Nyberg, L. (2017). Functional brain imaging of episodic memory decline in ageing. Journal of Internal Medicine, 281(1), 65–74. https://doi.org/10.1111/joim.12533

      Nyberg, L., Boraxbekk, C.-J., Sörman, D. E., Hansson, P., Herlitz, A., Kauppi, K., Ljungberg, J. K., Lövheim, H., Lundquist, A., Adolfsson, A. N., Oudin, A., Pudas, S., Rönnlund, M., Stiernstedt, M., Sundström, A., & Adolfsson, R. (2020). Biological and environmental predictors of heterogeneity in neurocognitive ageing: Evidence from Betula and other longitudinal studies. Ageing Research Reviews, 64, 101184. https://doi.org/10.1016/j.arr.2020.101184

      Paquola, C., Benkarim, O., DeKraker, J., Larivière, S., Frässle, S., Royer, J., Tavakol, S.,

      Valk, S., Bernasconi, A., Bernasconi, N., Khan, A., Evans, A. C., Razi, A., Smallwood, J., & Bernhardt, B. C. (2020). Convergence of cortical types and functional motifs in the human mesiotemporal lobe. eLife, 9, e60673. https://doi.org/10.7554/eLife.60673

      Pedersen, R., Johansson, J., Nordin, K., Rieckmann, A., Wåhlin, A., Nyberg, L., Bäckman, L., & Salami, A. (2024). Dopamine D1-Receptor Organization Contributes to Functional Brain Architecture. Journal of Neuroscience, 44(11). https://doi.org/10.1523/JNEUROSCI.0621-23.2024

      Pedersen, R., Johansson, J., & Salami, A. (2023). Dopamine D1-signaling modulates maintenance of functional network segregation in aging. Aging Brain, 3, 100079. https://doi.org/10.1016/j.nbas.2023.100079

      Persson, J., & Söderlund, H. (2015). Hippocampal hemispheric and long-axis differentiation of stimulus content during episodic memory encoding and retrieval: An activation likelihood estimation meta-analysis. Hippocampus, 25(12), 1614–1631. https://doi.org/10.1002/hipo.22482

      Przeździk, I., Faber, M., Fernández, G., Beckmann, C. F., & Haak, K. V. (2019). The functional organisation of the hippocampus along its long axis is gradual and predicts recollection. Cortex, 119, 324–335. https://doi.org/10.1016/j.cortex.2019.04.015

      Sadeghi, L., Rizvanov, A. A., Salafutdinov, I. I., Dabirmanesh, B., Sayyah, M., Fathollahi, Y., & Khajeh, K. (2017). Hippocampal asymmetry: Differences in the left and right hippocampus proteome in the rat model of temporal lobe epilepsy. Journal of Proteomics, 154, 22–29. https://doi.org/10.1016/j.jprot.2016.11.023

      Tian, Y., Margulies, D. S., Breakspear, M., & Zalesky, A. (2020). Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nature Neuroscience, 1–12. https://doi.org/10.1038/s41593-020-00711-6

      vos de Wael, R., Larivière, S., Caldairou, B., Hong, S.-J., Margulies, D. S., Jefferies, E., Bernasconi, A., Smallwood, J., Bernasconi, N., & Bernhardt, B. C. (2018). Anatomical and microstructural determinants of hippocampal subfield functional connectome embedding. Proceedings of the National Academy of Sciences, 115(40), 10154–10159. https://doi.org/10.1073/pnas.1803667115

    1. eLife Assessment

      The authors present an valuable and intriguing observation challenging current views on DNA methylation dynamics, revealing earlier-than-expected de novo methylation with significant implications for gene regulation in early embryonic development. However, the study's significance is difficult to ascertain due to incomplete evidence supporting the conclusions. Moreover, the observed changes in DNA methylation across promoter regions is modest, leaving its relevance open to alternative interpretations.

    2. Reviewer #1 (Public review):

      Based on the reviewers' comments, the authors conducted additional analyses to enhance their study. By utilizing publicly available datasets (Guo et al., 2017) capable of distinguishing the sex of embryos, they examined DNA methylation in male embryos and identified minor de novo DNA methylation events initiating at the 8-cell stage, predominantly on the X chromosome. However, this finding introduces confusion, as the authors had previously suggested that such minor de novo DNA methylation regulates imprinted X chromosome inactivation, a process specific to female embryos.

      The key unresolved issue is whether minor de novo DNA methylation in female embryos occurs exclusively on the "inactive" X chromosome or on both the active and inactive X chromosomes. The authors did not provide direct evidence supporting de novo DNA methylation specifically at the inactive X chromosome. Furthermore, it remains unclear whether this methylation influences embryonic development independent of sex or is specific to female embryos undergoing imprinted X chromosome inactivation. While the authors present data on decreased live birth rates in Figure 2F, they did not address whether there is a sex bias among the live pups, such as male-biased survival. Clarifying this point would strengthen their conclusions.<br /> In summary, the critical issue with the revised manuscript is that it does not adequately resolve whether minor de novo DNA methylation regulates embryonic development irrespective of sex or specifically impacts female embryos where imprinted X chromosome inactivation occurs. This distinction is essential for understanding the broader implications of their findings.

    3. Reviewer #2 (Public review):

      Summary:

      Yue et al. set out to determine if the low but measurable level of DNMT3B expression that is observed prior to the major wave of de novo DNA methylation has a function (ie before the epiblast stage) . Re-analyzing existing DNA methylation data from Smith et al. (2012) they find a very modest DNA methylation gain over a subset of promoters, on the order of 1%, occurring between the 8-cell and blastocyst stages, and refer to this as "minor de novo DNA methylation". They attempt to assess the relevance/functionality of this minor DNA methylation gain, and intriguingly report reduced H3K27me3 in Dnmt3b knockdown (KD) trophoblast cells that normally undergo imprinted X-chromosome inactivation (iXCI) before the blastocyst stage. In addition, they assess proliferation, differentiation, metabolic function, implantation rate and live birth rate of Dnmt3b KD blastocysts, and assign specific phenotypes to the loss of DNA methylation at this early stage..

      Strengths:

      Working with early embryos is technically demanding and as such the relevance of disrupting epigenetic factors specifically at this stage in development is less well studied. The detailed analyses of published data as well as DNMT3B depletion experiments presented in this manuscript provides food for thought for the epigenetics community.

      Weaknesses:

      - Throughout the manuscript, please represent DNA methylation changes as delta DNA methylation instead of fold change. In many figures, it is not clear what the unit of DNA methylation presented actually is. Readers should be made aware that the changes in DNA methylation observed are very modest and the threshold applied to the delta in DNA methylation is just 1% "( Δ DNA methylation > 0.01)").<br /> - The minimum coverage threshold and threshold applied For DNA methylation should be presented in each relevant figure. Currently for example, the latter is only mentioned not in the methods section but rather once in "Figure 2, figure supplement 1"<br /> - Indirect effects of disrupting DNMT3B at the earlier stages in development, when de novo DNAme levels are very low in the promoter regions of interest, should be considered. For example, de novo DNA methylation in repetitive regions/pericentric heterochromatin at this stage (not studied here) could be much higher than 1%. Disruption of such methylation, could result in a "sink effect", with loss of H3K27me3 at promoter regions (including on the inactive X-chromosome), due to aberrant repositioning of Polycomb complexes/PRC2 to such ectopic sites from which they are normally excluded, rather than a direct positive effect of the very low DNA methylation gain observed on Polycomb recruitment.<br /> The impact of depletion of DNMT3B on the major wave of de novo DNA methylation that takes place at the peri-implantion stage of embryonic development may also play a role in some of the later phenotypes observed. In other words when the failure of de novo methylation is more profound as levels of DNA methylation are much higher at these later stages as a consequence of DNMT3B activity.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Yue et al. re-processed publicly available DNA methylation data (published in 2012 and 2017 from the Meissner lab) from pre- and post-implantation mouse embryos. Against the global wave of genome-wide reduction of DNA methylation occurring during pre-implantation development, they detected a slight increase (~1% on average) of DNA methylation at gene promoter regions during the transition from 8-cell to blastocyst stage. They claim that many such promoters are located in the X chromosome. Subsequently, they knocked down Dnmt3b (presumably because of its upregulation during the transition from the 8-cell to blastocyst stage) and detected the aberrant patterning of H3K27me3 in the mutant female embryos. Based on this observation, they claim that imprinted X-chromosome inactivation is impaired in the Dnmt3b-Kd pre-implantation embryos. Finally, they propose a model where such an increase of DNA methylation together with H3K27me3 regulates imprinted X-chromosome inactivation in the pre-implantation embryos. While their observation is of potential interest, the current version of the work fails to provide enough evidence to support their conclusions. Below are suggestions and comments on the manuscript.

      Major issues:

      (1) Sex of the embryos of the genome-wide bisulfite-sequencing data

      The authors re-analyzed publicly available genome-wide DNA methylation data from the Meissner lab published in 2012 and 2017. The former used reduced representation bisulfite sequencing (RRBS) and the latter used whole-genome bisulfite sequencing (WGBS). Based mainly on the RRBS data, Yue et al. detected de novo DNA methylated promoters during the transition from 8-cell to blastocyst against the global wave of genome-wide DNA demethylation. They claim that such promoter regions are enriched at the "inactive" X chromosome. However, it would be difficult to discuss DNA methylation at inactive X-chromosomes as the RRBS data were derived from a mixture of male and female embryos. It would also be notable that the increase of DNA methylation at these promoter regions is ~1% on average. Such a slight increase in DNA methylation during pre-implantation development could also be due to the developmental variations between the embryos or between the sexes of embryos.

      Thanks so much for your insightful comments. Whether de novo DNA methylation occurs in a sex-dimorphic manner would be of significance for our study. Based on your comments, we have added a reanalysis based on a publicly available single cell multi-omics sequencing (COOL-seq) data of mouse early embryos (Guo et al., 2017). The results showed that both male and female embryonic cells gain DNA methylation during the transition from the 8-cell to ICM (Figure 1—figure supplement 1C-D; Lines 112-115 in the revised manuscript).

      With regards to the increase in the promoter region, many previous studies have revealed that promoter and overlapping CGI regions, especially high CpG promoters, always showed low levels of DNA methylation (Auclair et al., 2014; Borgel et al., 2010; Dahlet et al., 2020). The relatively lower basal levels make the increase seem relatively slight. Thus, we added relevant statements to clarify this information and rewritten the sentences in the revised manuscript (Lines 116-118, 125-127 in the revised manuscript).

      In addition, using the single cell COOL-seq data, we also specifically reanalyzed the DNA methylation changes on the X chromosome in female embryos. The X chromosome showed a more notable increase than that on autosomes, and the female X chromosome showed a higher DNA methylation level than that of the male (Figure 3—figure supplement 2A-B; Lines 203-206 in the revised manuscript).

      Thanks again for your insightful and constructive comments that significantly strengthen our evidence. We have added these results in the revised manuscript.

      (2) Imprinted X-chromosome inactivation and evaluation of H3K27me3 (related to Figures 2C, D; 3F; Figure2-supplement 2 F, G; Figure3-supplement 3G)

      Based on the slight change in the H3K27me3 signals in the Dnmt3b-Kd blastocysts, the authors claim that imprinted X-chromosome inactivation is impaired in the mutant embryo. It would be not easy to reach this conclusion from such a rough analysis of H3K27me3 presented in Figure 2C, D. Rigorous quantification/evaluation of the H3K27me3 signals in the Dnmt3b-Kd embryos should be considered. Additional evidence for the impairment of H3K27me3 in the mutant embryos should also be provided (expression of a subset of X-linked genes by RNA-FISH or RT-PCR etc.). Though technically challenging, high-resolution genome-wide approach such as ChIP-seq of H3K27me3 in the Dnmt3b-kd female embryos (with traceable SNPs between maternal and paternal X chromosome to distinguish inactive and active X-chromosome) could more precisely evaluate regions that lose H3K27me3 in the X-chromosome (de novo DNA methylated promoters from 8-cell to blastocyst, for example).

      Thanks so much for your insightful comments that make our results more convincing. The H3K27me3 domain is a classic marker for establishment of XCI by achieving X chromosome wide heterochromatinization of transcriptional depression (Chow and Heard, 2009; Heard et al., 2004; Huynh and Lee, 2005). Thus, in the present study, we have performed immunostaining for H3K27me3 domains to evaluate the iXCI status in the blastocysts, as previously reported (Fukuda et al., 2014; Gontan et al., 2018; Inoue et al., 2010; Tan et al., 2016). Base on your comments, we have added another statistical method to quantify the establishment of iXCI, i.e. the percentage of H3K27me3-positive and -negative cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not. The result also indicated that Dnmt3b knockdown led to a significant loss of H3K27me3 domains from total trophoblast cells. Similarly, new data based on statistical analyses of total trophoblast cells, has also been added in the results of Dnmt3b knockout and 5-aza-dC (Figure 3F; Figure 3—figure supplement 3D, H in the revised manuscript).

      To clarify the significance and reliability of detecting H3K27me3 domains, we have added a schematic diagram depicting the process of iXCI initiation and establishment, as well as the experimental design and work flows, to make our results easier to be understood (Figure 3C in the revised manuscript).

      In addition, we agree with your comments that additional evidence will benefit the conclusion. Thus, we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent Dnm3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that Dnmt knockout-induced chromosome-wide loss of DNA methylation led to a nearly complete loss of H3k27me3 on paternal X chromosome (specifically inactivated in iXCI), along with a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome.

      We have added this result in the revised manuscript (Lines 253-261; Figure 3—figure supplement 4A in the revised manuscript).

      (3) Analysis of the developmental potential of Dnmt3b-kd embryos

      While the authors claim that Dnmt3b-mediated de novo DNA methylation plays an important role in imprinted X-chromosome inactivation, it remains unclear whether the analysis presented in Figure 4 is derived from "female" embryos. This analysis seemed confusing as the authors claim that de novo DNA methylation in the promoter regions during the transition from 8-cell to blastocyst regulates imprinted X-chromosome inactivation, but this should not happen in the male embryos. Was the impairment of embryonic proliferation and differentiation observed in both male and female embryos? Or is this specific to the female embryos? We think that the sex of the embryos would be critical for the analysis presented in Figure 4.

      Thanks so much for your constructive comments to make our results smoother and clearer. The Figure 4 mainly presents the developmental role of minor de novo methylation based on the integrated analysis of DNA methylation and gene expression dynamics from the 8-cell to ICM. Because our data indicated that both male and female embryos undergo minor de novo methylation (Figure 1—figure supplement 1C-D in the revised manuscript). This section mainly focused on genome wide and general changes, but not on sex dimorphic consequence.

      To avoid the possible confusion, we have reorganized the RESULTS AND DISCUSSION section and presented this section as Figure 2 in the revised manuscript, before the chromosomal distribution analysis and subsequent detection relevant to iXCI.

      Reviewer #2 (Public Review):

      Summary:

      Here, Yue et al. set out to determine if the low DNMT3B expression that is observed prior to de novo DNA methylation (before the blastocyst stage) has a function. Re-analyzing existing DNA methylation data from Smith et al. (2012) they find a small DNA methylation gain over a subset of promoters and gene bodies, occurring between the 8-cell and blastocyst stages, and refer to this as "minor de novo DNA methylation". They attempt to assess the relevance/functionality of this minor DNA methylation gain, and report reduced H3K27me3 in Dnmt3b knockdown (KD) trophoblast cells that normally undergo imprinted X-chromosome inactivation (iXCI) before the blastocyst stage. In addition, they assess the proliferation, differentiation, metabolic function, implantation rate, and live birth rate of Dnmt3b KD blastocysts.

      Strengths:

      Working with early embryos is technically demanding, making the well-designed experiments from this manuscript useful to the epigenetics community. Particularly, the DNMT3B expression and 5-mC staining at different embryonic stages.

      Thanks for your positive evaluation, we have revised manuscript based on your comments, and the items need to be addressed in detail are explained in the point-by-point response to each comment.

      Weaknesses:

      - Throughout the manuscript, please represent DNA methylation changes as delta DNA methylation instead of fold change.

      Thanks so much for your constructive comments. We have represented DNA methylation changes as “ΔDNA methylation” (Figure 2—figure supplement 1A; Figure 3—figure supplement 1A; Figure 3—figure supplement 3I in the revised manuscript).

      - Detailed methods on the re-analysis of the DNA methylation data from Smith et al. 2012 are missing from the materials and methods section. Was a minimum coverage threshold used?

      Thanks so much for your reminder. We have added relevant statements and provided the detail of the coverage criteria in the subsection of Bioinformatics analysis in the Materials and methods section as follows: RRBS data of mouse embryos (2-cell embryos, 4-cell embryos, 8-cell embryos, ICM, and E6.5 embryos) were downloaded from the published article by Smith et al (Smith et al., 2012) (accession number: GSE34864). The methylation level was calculated as the number of “methylated” reads (reporting as C), divided by the total number of “methylated” and “unmethylated” read, which reporting as C or T. The genomic region information was downloaded from the mm9 Repeat Masker. As described in the published article, promoters were defined as 1 kb up- and downstream of the TSS and classified into high-density CpG promoter (HCP), intermediate-density CpG promoter (ICP) and low-density CpG promoter (LCP). Only CpG sites with at least fivefold coverage were included in the methylation analysis. We have added relevant information in the revised manuscript (Lines 462-470 in the revised manuscript).

      - Detailed methods on the establishment and validation of Dnmt3b KO blastocysts and 5-aza-dC treated blastocysts are missing (related to Figure 2).

      Thanks so much for your detailed reminder. In the present study, we used a well-established Dnmt3b-deficient mouse model (Okano et al., 1999) to validate the role of minor de novo DNA methylation in iXCI establishment. Heterozygous Dnmt3b<sup>+/-</sup> mice that carry one mutant locus of Dnmt3b, were obtained from the Mutant Mouse Resource & Research Centers (MMRRC, NIH). Homozygous embryos were obtained by intercrossing Dnmt3b<sup>+/-</sup> male and female mice. Genotyping assays of collected embryos was performed by PCR using primers that were designed based on the gene targeting strategy following the MMRRC genotyping protocol (https://www.med.unc.edu/mmrrc/genotyping-protocols/mmrrc-center-protocol-29886/). We have provided the detailed methods in the revised manuscript (Lines 350-354; 391-393 in the revised manuscript). In addition, we added a schematic diagram depicting the processes of embryo collection and detection (Figure 3—figure supplement 3A in the revised manuscript).

      Similarly, we have provided relevant details of 5-aza-dC supplementation in the revised manuscript (Lines 412-415 in the revised manuscript) and added a schematic diagram depicting the details of experimental design and processes (Figure 3—figure supplement 3E in the revised manuscript).

      - Detailed methods on the re-analysis of the ChIPseq data from Liu et al. 2016 are missing from the materials and methods section.

      Thank you for pointing this out. The bigwig files of H3K27me3 ChIP-seq data were downloaded from the published article by Liu et al (Liu et al., 2016)(accession number: GSE73952). These signal tracks were generated using the MACS2 (v2.0.10.20131216) pileup function and normalized to 1 million reads for visualization, as described in the original publication. We have added relevant information to the MATERIALS AND METHODS section in the revised manuscript (Lines 474-479 in the revised manuscript).

      - Some of the data represented in bar graphs does not look convincing/significant. Maybe this data can be better represented differently, such as in box plots or violin plots, which would better represent the data.

      Thanks so much for your comments that improve our result presentation, relevant results have been changed into box plots in the revised manuscript (Figure 3E; Figure 3—figure supplement 3C; Figure 3—figure supplement 3G in the revised manuscript). In addition, to strengthen our evidence, we have added alternative statistical method to quantify the establishment of iXCI, i.e. the percentage of H3K27me3-positive and -negative cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not. (Figure 3F; Figure 3—figure supplement 3D, H in the revised manuscript).

      - The relevance and rationale for experiments using 5-aza-dC treatment is unclear.

      Thanks so much for reminding us to make our results more informative and convincing. 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and thus has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005).

      In our study, to validate the function of minor de novo DNA methylation in iXCI, we take advantage of 5-aza-dC-induced DNMT inhibition, which allows us, despite its inhibitory effect common to various DNMTs, to transiently treat embryos specifically during the window of minor de novo DNA methylation (from the 8-cell to blastocyst stage). We have added these statements, as well as a schematic diagram depicting the experimental design, in the revised manuscript to make our experiments more rational and easier to be understood (Lines 183-188; Figure 3—figure supplement 3E in the revised manuscript).

      References

      Auclair, G., Guibert, S., Bender, A. and Weber, M. (2014). Ontogeny of CpG island methylation and specificity of DNMT3 methyltransferases during embryonic development in the mouse. Genome Biol. 15, 545.

      Borgel, J., Guibert, S., Li, Y., Chiba, H., Schubeler, D., Sasaki, H., Forne, T. and Weber, M. (2010). Targets and dynamics of promoter DNA methylation during early mouse development. Nat. Genet. 42, 1093-1100.

      Chen, Z., Yin, Q., Inoue, A., Zhang, C. and Zhang, Y. (2019). Allelic H3K27me3 to allelic DNA methylation switch maintains noncanonical imprinting in extraembryonic cells. Sci Adv 5, eaay7246.

      Chow, J. and Heard, E. (2009). X inactivation and the complexities of silencing a sex chromosome. Curr. Opin. Cell Biol. 21, 359-366.

      Dahlet, T., Argueso Lleida, A., Al Adhami, H., Dumas, M., Bender, A., Ngondo, R. P., Tanguy, M., Vallet, J., Auclair, G., Bardet, A. F., et al. (2020). Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat Commun 11, 3153.

      Fukuda, A., Tomikawa, J., Miura, T., Hata, K., Nakabayashi, K., Eggan, K., Akutsu, H. and Umezawa, A. (2014). The role of maternal-specific H3K9me3 modification in establishing imprinted X-chromosome inactivation and embryogenesis in mice. Nat Commun 5, 5464.

      Galupa, R. and Heard, E. (2015). X-chromosome inactivation: new insights into cis and trans regulation. Curr. Opin. Genet. Dev. 31, 57-66.

      Gontan, C., Mira-Bontenbal, H., Magaraki, A., Dupont, C., Barakat, T. S., Rentmeester, E., Demmers, J. and Gribnau, J. (2018). REX1 is the critical target of RNF12 in imprinted X chromosome inactivation in mice. Nat Commun 9, 4752.

      Guo, F., Li, L., Li, J., Wu, X., Hu, B., Zhu, P., Wen, L. and Tang, F. (2017). Single-cell multi-omics sequencing of mouse early embryos and embryonic stem cells. Cell Res. 27, 967-988.

      Heard, E., Chaumeil, J., Masui, O. and Okamoto, I. (2004). Mammalian X-chromosome inactivation: an epigenetics paradigm. Cold Spring Harb. Symp. Quant. Biol. 69, 89-102.

      Huynh, K. D. and Lee, J. T. (2005). X-chromosome inactivation: a hypothesis linking ontogeny and phylogeny. Nat. Rev. Genet. 6, 410-418.

      Inoue, K., Kohda, T., Sugimoto, M., Sado, T., Ogonuki, N., Matoba, S., Shiura, H., Ikeda, R., Mochida, K., Fujii, T., et al. (2010). Impeding Xist expression from the active X chromosome improves mouse somatic cell nuclear transfer. Science 330, 496-499.

      Liu, X. Y., Wang, C. F., Liu, W. Q., Li, J. Y., Li, C., Kou, X. C., Chen, J. Y., Zhao, Y. H., Gao, H. B., Wang, H., et al. (2016). Distinct features of H3K4me3 and H3K27me3 chromatin domains in pre-implantation embryos. Nature 537, 558-562.

      Maslov, A. Y., Lee, M., Gundry, M., Gravina, S., Strogonova, N., Tazearslan, C., Bendebury, A., Suh, Y. and Vijg, J. (2012). 5-aza-2'-deoxycytidine-induced genome rearrangements are mediated by DNMT1. Oncogene 31, 5172-5179.

      Oka, M., Meacham, A. M., Hamazaki, T., Rodic, N., Chang, L. J. and Terada, N. (2005). De novo DNA methyltransferases Dnmt3a and Dnmt3b primarily mediate the cytotoxic effect of 5-aza-2'-deoxycytidine. Oncogene 24, 3091-3099.

      Okano, M., Bell, D. W., Haber, D. A. and Li, E. (1999). DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247-257.

      Schulz, E. G. and Heard, E. (2013). Role and control of X chromosome dosage in mammalian development. Curr. Opin. Genet. Dev. 23, 109-115.

      Smith, Z. D., Chan, M. M., Mikkelsen, T. S., Gu, H. C., Gnirke, A., Regev, A. and Meissner, A. (2012). A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 484, 339-344.

      Tan, K., An, L., Miao, K., Ren, L., Hou, Z., Tao, L., Zhang, Z., Wang, X., Xia, W., Liu, J., et al. (2016). Impaired imprinted X chromosome inactivation is responsible for the skewed sex ratio following in vitro fertilization. Proc. Natl. Acad. Sci. U. S. A. 113, 3197-3202.

      Reviewer #1 (Recommendations For The Authors):

      Title

      It would be hard to understand what "co"-regulates means. Does this mean DNA methylation and H3K27me3 co-regulate imprinted X- X-chromosome inactivation? If so, the title can be reworded.

      Thanks for your insightful comments, the title has been corrected into “A wave of minor de novo DNA methylation initiates in mouse 8-cell embryos and co-regulates imprinted X- chromosome inactivation with H3K27me3” (Line 2 in the revised manuscript).

      Text

      (1) As DNA methylation analysis is a primary part of this study, how they processed DNA methylation data can be added to the "Bioinformatics analysis" in the MATERIALS AND METHODS section.

      Thanks for your kind reminder. We have added relevant information in the Materials and methods section in the revised manuscript (Lines 462-474 in the revised manuscript).

      (2) It seems that recent literature has not been cited in the manuscript. Specifically, none of the papers after 2018 were cited. Recent relevant papers should also be cited throughout the manuscript.

      Thanks so much for your reminder. We have added more recent literature to update the relevant information, such as the evidence supporting the causal role between DNA methylation and XCI (Lines 225-228, 264-265 in the revised manuscript); the concurrent enrichment of DNA methylation and H3K27me3 in genes subject to XCI (Lines 301-303 in the revised manuscript); the dominant role of de novo methylation in X chromosome (Lines 253-256 in the revised manuscript), etc.

      (3) Line 56: The first report that describes the dynamics of DNMT3B expression in pre-implantation embryonic development (Hirasawa et al., 2007) is missing. This paper should be cited.

      Sorry for our carelessness, we have added relevant references and rewritten the sentence in the revised manuscript (Lines 56-57 in the revised manuscript). I think you meant the report by Hirasawa et al in 2008, in which presented expression and subcellular localization of Dnmt3a and Dnmt3b in mouse oocytes and preimplantation embryos.

      (4) Line 98: It would be good to mention that the data were derived from reduced representation bisulfite sequencing as the authors used whole-genome bisulfite sequencing data from the same research group as well.

      Thanks for your kind reminder. As you have suggested, we have added the description in the revised manuscript to emphasize that these data were derived from reduced representation bisulfite sequencing, while another data were derived from whole-genome bisulfite sequencing, respectively. (Lines 98-99, 111 in the revised manuscript).

      (5) Line 101: We first... "the preferential target of DNMT3B (Auclair et al., 2014; Borgel et al., 2010)". More recent literature (Baubec et al., 2016, Duymich et al., 2016, for example) showed that the preferential target of DNMT3B is not a promoter but a gene body. This sentence should be reworded.

      Thanks so much for your detailed reminder. As you have pointed out, “preferential target” seems to be an inaccurate statement. Besides of promoters, gene bodies and other elements also undergo de novo DNA methylation (Auclair et al., 2014; Dahlet et al., 2020; Duymich et al., 2016).

      We have rewritten the sentence as follows in the revised manuscript: “Promoter regions are important target sites of DNMT3B (Choi et al., 2011). The acquisition of DNA methylation in promoters, especially in intermediate and low CpG promoters, during implantation is largely dependent on DNMT3B and plays an important role in regulating developmental genes (Auclair et al., 2014; Borgel et al., 2010; Dahlet et al., 2020). Thus, among genomic regions that may undergo de novo DNA methylation, we initially focused our analysis on DNA methylation dynamics of promoters...” (Lines 100-106 in the revised manuscript)

      (6) Lines 108-109: It would be good to mention that these data were derived from whole-genome bisulfite sequencing.

      Thanks for your kind reminder. As aforementioned, we have added a description in the revised manuscript to distinguish between data derived from reduced representation bisulfite sequencing and whole-genome bisulfite sequencing (Lines 98-99, 111 in the revised manuscript).

      (7) Line 141: rXCI should be defined.

      Thanks for your kind reminder. We have added full descriptions and more necessary information about iXCI and rXCI, to make our statements clearer and easier to be understood (Lines 210-213 in the revised manuscript). In addition, we carefully checked the relevant descriptions throughout the manuscript, and each abbreviation (such as “ICM”) has been defined at its first occurrence. Additionally, we have replaced abbreviations that appears only once in the manuscript with their full terms (Lines 122, 212 in the revised manuscript).

      (8) Lines 145-149: The role of DNA methylation for imprinted X-inactivation has already been reported (Chiba et al., 2008). The relevant sentences should be reworded.

      Thanks so much for reminding us the important earlier literature that explores the relationship between DNA methylation and XCI. However, the primary aim and hypothesis of the study by Chiba et al. are different from those of our study. Chiba et al focused on whether DNA methylation is the imprinting mark responsible for monoallelic expression of Xist (the initiation event of iXCI), while our study focused on the role of DNA methylation in achieving X chromosomal heterochromatinization (the late event of iXCI).

      In detail, the study by Chiba et al. mainly focused on exploring why Xist is specifically expressed from paternal allele and iXCI occurs specifically on the paternal X chromosome in mouse preimplantation embryos. Because Previous studies have suggested that genomic imprinting of Xist is established during oogenesis (Oikawa et al., 2014; Tada et al., 2000), Chiba et al. wanted to test whether the DNA methylation imprinting established during oogenesis is responsible for the monoallelic expression of Xist in preimpantaiton embryos. Analyses of DNA methyltransferase maternal knockout embryos revealed that oocyte DNA methylation is dispensable for Xist imprinting (Chiba et al., 2008). Follow-up study by Inoue et al. identified a broad H3K27me3 enrichment within the Xist 5’region established during oocyte growth and persists through preimplantation development, as the imprinting mark of Xist (Inoue et al., 2017). These series of studies are very important and allows us to understand the mechanism underlying paternal allele-specific iXCI in mouse preimplantation embryos and extraembryonic tissues.

      However, the hypothesis is different in our study. Based on the finding of minor de novo DNA methylation and its preferential distribution on the X chromosome, we have speculated that the minor de novo methylation, which occurs from the 8-cell to blastocyst stage, may participate in achieving X chromosomal heterochromatinization. Although DNA methylation is essential for maintaining X chromosome-wide transcriptional silence of rXCI, its role in iXCI remains controversial and it is even plausibly thought that DNA methylation is not required for achieving iXCI because preimplantation embryos undergo global and massive DNA demethylation.

      We have reorganized this paragraph, relevant statements have been added to make the background and discussion clearer and easier to be understood. (Lines 217-234 in the revised manuscript)

      (9) Lines 164-165: Information regarding Dnmt3b KO is missing. Did the authors generate an original KO line or use an already published one? It should be explicitly stated.

      Thank you so much for your kind reminder. The Dnmt3b heterozygous mice were obtained from the Mutant Mouse Resource & Research Centers (MMRRC), and Dnmt3b knockout (KO) embryos were generated by mating Dnmt3b heterozygous females with heterozygous males. The genotyping of Dnmt3b KO embryos was performed by PCR following the MMRRC genotyping protocol (https://www.med.unc.edu/mmrrc/genotyping-protocols/mmrrc-center-protocol-29886/). The relevant information has been added to the MATERIALS AND METHODS section in the revised manuscript (Lines 350-354; 391-393 in the revised manuscript).

      (10) Line 165: chemical-induced inhibition of DNMT3B. As 5-aza-dC also blocks DNMT3A and DNMT1, this sentence should be reworded.

      Thank you for your valuable comments. 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005). Thus, despite its inhibitory effect common to various DNMTs, chemical-induced inhibition of DNMTs has the advantage of allowing us to transiently treated embryos specifically during the window of minor de novo DNA methylation (the 8-cell to blastocyst stage). We have rewritten the relevant sentences in the revised manuscript (Lines 183-188 in the revised manuscript).

      (11) Lines 171-174: "The role of de novo methylation in iXCI...". This possibility was already tested in the previous study from the Sasaki lab (Chiba et al., 2008).

      As mentioned above, the primary aim and hypothesis of the study by Chiba et al. are different from those of our study. Chiba et al. mainly focused on exploring why Xist is specifically expressed from paternal allele and iXCI occurs specifically on the paternal X chromosome in mouse preimplantation embryos, so they tested whether the DNA methylation imprinting established during oogenesis is responsible for this monoallelic expression of Xist in preimplantation embryos (the initiation event of iXCI).

      By contrast, based on the finding of minor de novo DNA methylation and its preferential distribution on X chromosome, our study has speculated that the minor de novo DNA methylation, which occurs from the 8-cell to blastocyst stage, may participate in achieving X chromosomal heterochromatinization (the late event of iXCI).

      Thanks so much for reminding us this important literature, to make our discussion more informative. We have reorganized this paragraph by rewriting or adding relevant statements to make the background and discussion clearer and easier to be understood (Lines 217-231 in the revised manuscript). In addition, to avoid repeated statement and make our discussion more concise, we have removed the similar sentences at the end of this paragraph.

      (12) Lines 198-200: "Given DNA methylation...". These citations mention a general relationship between DNA methylation and H3K27me3 in cells in culture. As I believe the authors focus on X-chromosome inactivation in the female embryos, more relevant papers that discuss the order of the events for the establishment of H3K27me3 and DNA methylation in the inactive X-chromosome can be cited.

      Thanks so much for your comment to improve our discussion. It has been thought that during the late phase of rXCI in fully differentiated cells, gene silencing is achieved by PRC2 complex-induced H3K27me3, and then is further stably maintained by the redundant action of multiple layers of epigenetic modifications, including DNA methylation, to reach the maximum level of chromatin compaction (Chow and Heard, 2009; Heard et al., 2004; Pintacuda and Cerase, 2015). In line with this, a recent multifaceted analysis showed that DNA methylation and H3K27me3 are concurrently enriched in genes subject to XCI (Balaton and Brown, 2021). We have added these statements in the revised manuscript (Lines 295-303 in the revised manuscript).

      (13) Line 241: As 5-aza-dC blocks both de novo and maintenance DNA methylation, this sentence should be reworded.

      Thank you for your kind reminder. As you have mentioned above, 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005). Thus, despite its inhibitory effect common to various DNMTs, chemical-induced inhibition of DNMTs has the advantage of allowing us to transiently treated embryos specifically during the window of minor de novo DNA methylation (the 8-cell to blastocyst stage). We have rewritten the relevant sentences in the revised manuscript (Lines 183-188 in the revised manuscript).

      Figures

      (1) Figure 1C, D: Do the rows in C and D show the corresponding genes?

      Figure 1C and D represent the DNA methylation changes of promoters (C) and gene bodies (D) respectively, during the transition from the 8-cell to blastocyst stage. Two data were analyzed independently, and rows did not show the corresponding genes. Since we have focused on the minor de novo methylation in promoter regions, to avoid confusion, the results of the gene body have been removed from the revised manuscript.

      (2) Figure 1G: Yy2 promoter gained DNA methylation during the transition from 8-cell to the blastocyst stage. Is this a representative locus for the de novo methylated promoters that are shown in Figure 1F where an increase of DNA methylation is about ~1% on average? Another representative locus could be shown instead of this gene promoter.

      Thanks so much for you detailed reminder. The inconsistency between the global methylation change and bisulfite sequencing analysis of Yy2, may be due to the details of methodologies, such C-T conversion efficiency, the number of picked colonies, etc. Since we have confirmed the presence of minor de novo DNA methylation using different publicly available data, to avoid ambiguity, we have removed this result in revised manuscript.

      (3) Figures 2C and 3A: It would be helpful to mention what the arrowheads mean.

      Thanks so much for you detailed reminder. In Figure 2C, the arrowhead indicates the H3k27me3 domain and the blank arrowhead indicates the blastomere without the H3k27me3 domain. In Figure 3A, the arrowhead indicates Xist RNA domain and the blank arrowhead indicates the blastomere without Xist RNA domain. We have added the information in the revised manuscript (Lines 736-738, 747-749 in the revised manuscript).

      (4) Figure 3-figure supplement 2B: It would be hard to see whether H3K27me3 is enriched at the promoter regions of presented genes. It would be helpful to show the values for the Y-axis as in panel A.

      Thanks for your helpful reminder. We have added the scales to the figure to improve the result presentation (Figure 4—figure supplement 2B in the revised manuscript).

      (5) Figure 4-figure supplement 2: 5-aza-dC blocks not only the activity of DNMT3B but also DNMT1, and DNMT3A (all these DNMTs are expressed during pre-implantation embryos, see Hirasawa et al., 2007). This part can be omitted from the manuscript.

      Thanks for your insightful comments. As you have mentioned above, the relevance and rationale for experiments using 5-aza-dC treatment should be clarified. 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and thus has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005).

      In our study, to validate the function of minor de novo DNA methylation in iXCI and blastocyst development, we take advantage of 5-aza-dC-induced DNMT inhibition, which allows us to transiently treated embryos specifically during the window of minor de novo DNA methylation (the 8-cell to blastocyst stage), despite its non-specificity to various DNMTs.

      Based on these considerations, we hope to retain this result, and wish to get your understanding.

      We have added these statements in the revised manuscript to make our experiments more rational and easier to be understood (Lines 183-188 in the revised manuscript) and added a schematic diagram depicting the experimental design (Figure 3—figure supplement 3E in the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations/concerns in the text:

      - Line 106, it is unclear what is meant by "in line with this"? Gene body DNA methylation is a characteristic of active transcription, so why would a gain in DNA methylation at promoters be in line with a gain in DNA methylation over gene bodies?

      Thank you so much for your comments that pointed out our ambiguous statement. We meant both the promoter and gene body regions, albeit accounting for small proportions, gain DNA methylation during the transition from the 8-cell to blastocyst stage. Based on the comment by Reviewer#1, since we have focused on the minor de novo methylation in promoter regions, to avoid confusion, the results of the gene body have been removed from the revised manuscript.

      - Line 111 & 114, can 6% DNA methylation really be considered "relatively hypermethylated" compared to 3% DNA methylation that is referred to as "more hypomethylated"?

      We apologize for our unclear and ambiguous statements. Here we focused on the promoter regions. Many previous studies have revealed that compared with gene bodies and other genome elements, promoter and overlapping CGI regions, especially high CpG promoters, always showed low levels of DNA methylation. We have added relevant statements to clarify this information, and rewritten the sentences in the revised manuscript (Lines 100-106, 116-118, 121, 124 in the revised manuscript).

      - Line 124, there are a number of processes identified, why only mention one in the text? Suggest changing writing to be more accurate, indicating what was included for the GO analysis and using the words "enriched for ... processes". Saying it may be linked to a process is an overstatement and not supported by further experiments/data.

      Thank you so much for your detailed comments that make our results more informative. We have checked the relevant description and addressed your suggestions as follows: By performing gene ontology enrichment analysis of genes that undergo minor or major de novo DNA methylation respectively, we noticed that besides of many important basic processes common to two waves of de novo DNA methylation, genes subject to minor de novo DNA methylation were enriched in processes such as organic substance transport, chromosome organization, and cell fate specification (Lines 129-134 in the revised manuscript).

      - Lines 149 - 152: sentence/message unclear.

      We apologize for the ambiguous description. We have corrected the relevant descriptions as follows: To identify the biological function of minor de novo DNA methylation in iXCI, we knocked down Dnmt3b in preimplantation embryos by microinjecting Dnmt3b siRNA into zygotes (Lines 234-236 in the revised manuscript).

      - Lines 162-164: the data in Figure 2C/D does not support this statement, as it does not show H3K27me3 loss specifically at the inactive X-chromosome.

      Thanks so much for your insightful comments. Despite the global enrichment of H3K27me3, the H3K27me3 domain detected by immunostaining is a classic marker for establishment of XCI by achieving X chromosome wide heterochromatinization of transcriptional depression (Chow and Heard, 2009; Heard et al., 2004; Huynh and Lee, 2005). Thus, we have used immunostaining for H3K27me3 domains to evaluate the iXCI establishment in the blastocysts, as previously reported (Fukuda et al., 2014; Gontan et al., 2018; Inoue et al., 2010; Tan et al., 2016). To make our results more convincing, we have added another statistical method to quantify the establishment of iXCI, i.e., the percentage of H3K27me3-positive and -negative trophoblast cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not.

      In addition, we have added a schematic diagram depicting the process of iXCI initiation and establishment, as well as the experimental design and work flows, to make the result easier to be understood.

      In addition, we agree with your comments that additional evidence will benefit the conclusion. To strengthen the evidence, and test whether DNA methylation loss leads to a prolonged effect on iXCI, we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent Dnm3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that chromosome-wide loss of DNA methylation led to a nearly complete loss of H3k27me3 on paternal (specifically inactivated in iXCI), along with a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome. (Lines 253-261; Figure 3—figure supplement 4A in the revised manuscript)

      - Lines 169-174: sentence/message unclear.

      As aforementioned, we have reorganized this paragraph by rewriting or adding relevant statements relevant to the DNA methylation and XCI, to make the background and discussion clearer and easier to be understood (Lines 217-234 in the revised manuscript). In addition, to avoid repeated statement and make our discussion more concise, we have removed the similar sentences at the end of this paragraph.

      - Lines 177-179: this statement is too bold. The data does not support "direct evidence".

      Thank you for your detailed reminder. We have rewritten the sentence to avoid confusion and overstatement (Lines 262-268 in the revised manuscript).

      - Line 198: these are not all enzymes, but could be referred to as chromatin modifiers.

      We apologize for the ambiguous description. As you suggested, we have corrected “enzymes” to “chromatin modifiers” (Lines 284, 287 in the revised manuscript).

      - Line 199: this statement is not correct in all contexts. There are many studies showing antagonism between DNA methylation and H3K27me3.

      Thanks so much for you careful reviewing. As you have pointed out, the relationship of DNA methylation and H3K27me3 are divergent and largely controversial among studies. Under certain circumstances, DNA methylation shows antagonistic effect to H3K27me3 at promoters, via excluding the binding of PRC2 (the main complex responsible for H3K27me3 deposition) components to their targets (Bartke et al., 2010; Jermann et al., 2014), while other studies have presented alternative evidence that PRC2 (the main complex responsible for H3K27me3 deposition) and DNA methylation cooperate to achieve silencing (Hagarman et al., 2013; Vire et al., 2006). Thus, it has been thought that the relationship between DNA and methylation and histone modifications is complex, possibly in a cell-type and/or genomic region-specific manner. Both antagonism and coordination can be observed in different regulatory elements in mouse ES cells (King et al., 2016).

      We apologize our incomplete statement because we mainly focused on their synergistic relationship. We have refined this section by rewriting relevant sentences and adding necessary statements (Lines 288-303 in the revised manuscript).

      - Lines 228-230: the developmental significance of DNA methylation homeostasis is already well-established. Please reference relevant papers showing this here.

      Thank you for this helpful suggestion. We have reorganized this section. Relevant references that highlight the developmental significance of DNA methylation homeostasis have added. The sentence has been rewritten and moved to the end of this paragraph, in the revised manuscript (Lines 159-161 in the revised manuscript).

      - Line 238: an explanation/rationale for looking at energy metabolism is lacking.

      Thank you for your comments to make our results earlier to be understood. The detection of energy metabolism is mainly based on the integrated analysis of DNA methylation and gene expression from the 8-cell embryos to ICM, to test the potential short-and long-term developmental consequences of minor de novo DNA methylation. Bioinformatic analysis suggested that many basic processes, such as cell differentiation, cell cycle and metabolic regulation, may be regulated by minor de novo DNA methylation. Among the enriched genes, several are related energy metabolism. In addition, because energy metabolism is crucial for supporting embryo differentiation and development, and oxidative phosphorylation (OXPHOS) metabolism is highly activated during the blastocyst stage (Zhao et al., 2021), we next examined the energy metabolism, particularly OXPHOS activity, of Dnmt3b-KD embryos. We have refined the section by rewritten relevant sentence and added necessary statements (Lines 175-179 in the revised manuscript).

      - Lines 246-248: Looking at the data in Figure 2 figure supplement 2, this statement is simply not true with regards to DNMT3B protein, and also global DNA methylation level is reduced in the Dnmt3b KD blastocyst, which could lead to defective major de novo DNA methylation.

      Thanks for your careful reviewing, we have rewritten the sentence to make our statement more accurate and avoid overstatement (Lines 188-190 in the revised manuscript).

      Recommendations/concerns relating to figures:

      Figure 1:

      - Of all genic promoters, how many were included in the analysis (contained sufficient coverage)? What cut-off/thresholds were used to consider DNA methylation gain at a promoter?

      Thanks for your comments. In total, 11662 promoters were analyzed. Given that promoter methylation is generally at low level, particularly at the 8-cell stage at which minor de novo methylation is just initiated. The relatively lower basal levels make the increase before the blastocyst, seem considerably slight. To capture the slight changes, we have used the relaxed threshold based on ΔDNA methylation. Only CpG sites with at least fivefold coverage were included in the methylation analysis based on data from Smith et al. (Smith et al., 2012)., ΔDNA methylation greater or less than 0 was defined as gain or loss of DNA methylation. We have added this information in the revised manuscript (Lines 462-470 in the revised manuscript).

      - Does an average methylation level of 0.02 represent 2% DNA methylation? Presuming yes, is the average 1.5% DNA methylation gain at promoters real? And meaningful? Especially compared to the gain in DNA methylation that takes place between ICM and E6.5 (Figure 1 Figure Supplement 1 D)

      As you have pointed out, an average methylation level of 0.02 represent 2% DNA methylation. As aforementioned, promoters exhibited an average of 1.5% DNA methylation gain during the transition from 8-cell stage to ICM. The slight increase may be mainly due to the relatively lower basal levels. As you expected, compared with the comprehensive de novo DNA methylation during implantation, preimplantation de novo methylation occurs more slightly, at a small proportion of promoter regions, so designated it as minor de novo DNA methylation. It should be also mentioned that a proportion of these promoters continue to gain massive DNA methylation during implantation. We have refined the relevant sentences to provide more detailed information of our results (Lines 125-127 in the revised manuscript).

      - Why is there a focus on promoters (which are not the preferential target of DNMT3B)?

      Thanks so much for your detailed reminder. As you have pointed out, “preferential target” seems to be an inaccurate statement. besides of promoters, gene bodies and other elements also undergo de novo DNA methylation (Auclair et al., 2014; Dahlet et al., 2020; Duymich et al., 2016). We have focused on the promoter regions based on the following considerations: (1) Promoter regions are important target sites of DNMT3B (Choi et al., 2011); (2) The acquisition of DNA methylation in promoters, especially in intermediate and low CpG promoters, during implantation is largely dependent on DNMT3B and plays an important role in regulating developmental genes (Auclair et al., 2014; Borgel et al., 2010; Dahlet et al., 2020). We have rewritten the relevant sentence in the revised manuscript (Lines 100-106 in the revised manuscript).

      - Figure 1H shows that promoters that gain DNA methylation during the "minor de novo DNA methylation" continue to gain DNA methylation during "de novo DNA methylation". Is the ~1.5% DNA methylation gain just the slow start of the main de novo DNA methylation wave?

      Your comments is very helpful to improve the description of our results. In the present study, our analysis indicated that a small proportion of promoters initially gain methylation during the transition from the 8-cell to ICM. The finding challenges current knowledge: (1) de novo DNA methylation occurs during implantation, by which globally hypomethylated blastocysts acquire genome-wide DNA methylation (Borgel et al., 2010; Dahlet et al., 2020; Smith et al., 2012); (2) during preimplantation development, embryos undergo massive and global DNA demethylation.

      To distinguish the current knowledge of the timing and dynamics of DNA methylation during the early development, we have designated our finding during the transition from the 8-cell to blastocyst stage, as minor de novo DNA methylation.

      We agree with your notion that among the promoters undergoing minor de novo methylation, most of them continue to gain DNA methylation during implantation, as revealed in Fig. 1F. We have added refine the relevant statement in revised manuscript (Lines 125-127 in the revised manuscript).

      - The GO analysis performed for Figure 1H, what was used as input? Promoters of genes that gain DNA methylation as identified in 1C?

      Thank you for your comments. For the GO analysis shown in Figure 1H, we used genes with promoter regions that gained or lost DNA methylation during the transition from the 8-cell to ICM respectively (identified in Figure 1C, as input), respectively. This information has been clarified in the revised manuscript to ensure accuracy (Lines 129-134 in the revised manuscript).

      - Figure 1 figure supplement 1, is there only a fold change as threshold or also a calculated significance (eg. p-value/FDR)?

      Thanks for your valuable comments. Considering the relatively low DNA methylation levels at promoter regions, and the slightly changes occurring during the preimplantation embryo development, we used the relaxed threshold based on ΔDNA methylation. Only CpG sites with at least fivefold coverage were included in the methylation analysis based on data from Smith et al. (Smith et al., 2012), ΔDNA methylation greater or less than 0 was defined as gain or loss of DNA methylation. We have replaced relevant figures and added this information in the revised manuscript (Figure 1—figure supplement 1D-E; Lines 125-127 in the revised manuscript).

      - To confirm DNMT3B is responsible for the DNA methylation gain: DNMT3B KD/KO followed by promoter DNA methylation analysis to confirm the promoters that gain DNA methylation between 8 cell and ICM don't gain DNA methylation in the absence of DNMT3B.

      We agree with your comments that additional evidence will benefit the conclusion. To strengthen the evidence, we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent Dnm3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that chromosome-wide loss of DNA methylation led to a nearly complete loss of H3k27me3 on paternal (specifically inactivated in iXCI), which showed a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome. We have added this result in the revised manuscript (Lines 253-261; Figure 3—figure supplement 4A in the revised manuscript).

      Figure 2:

      - Figure 2A: label missing for what the numbers on the y-axis represent.

      Thank you for pointing this out. We apologize for the oversight. We have added the label of y-axis in Figure 2A to clarify what the numbers represent, making it easier to be understood (Figure 3A in the revised manuscript).

      - Figure 2B: y-axis is % of methylated promoters compared to all promoters?

      Thank you for your suggestion. The y-axis in Figure 2B indeed represents the percentage of de novo methylated promoters relative to all promoters. As you have suggested, we have clarified this labeling in the revised manuscript (Figure 3B in the revised manuscript).

      - What is the delta DNA methylation gain specifically for X-linked promoters?

      Thanks so much for your reminder. To provide more convincing evidence. We have reanalyzed a single cell COOL-seq data, we also specifically reanalyzed the DNA methylation changes on the X chromosomal promoter in female embryos. The X chromosome showed a more notable increase in the de novo methylated promoters than that on autosomes, and the female X chromosome showed higher DNA methylation levels than that of the male (Figure 3—figure supplement 2A-B; Lines 203-206 in the revised manuscript).

      - Figure 2C: include representative images of separate channels to better see the signal of CDX2 and H3K27me3. Quantification would be better represented with box plots.

      Thank you for your helpful suggestions. We have added separate channel images in the revised manuscript. Additionally, we have adjusted the quantification to be represented as box plots, as you have suggested, to improve the accuracy and interpretability of the data presentation (Figure 3D-F in the revised manuscript).

      - Figure 2C: Does the H3K27me3 signal overlap with the location of the inactive X-chromosome (is there maybe denser DAPI or do IF combined with Xist RNA-FISH)?

      Thanks so much for your insightful comments. Despite the global enrichment of H3K27me3, the H3K27me3 domain detected by immunostaining is a classic marker for establishment of XCI by achieving X chromosome wide heterochromatinization of transcriptional depression (Chow and Heard, 2009; Heard et al., 2004; Huynh and Lee, 2005). Thus, we have used immunostaining for H3K27me3 domains to evaluate the iXCI establishment in the blastocysts, as previously reported (Fukuda et al., 2014; Gontan et al., 2018; Inoue et al., 2010; Tan et al., 2016). We have taken effort to perform co-staining of H3K27me3 IF and Xist FISH, but was hindered by the technical challenge, we wish to get your understanding. However, as we aforementioned, H3K27me3 is a well-accepted maker to clarify the XCI status.

      In addition, to make our results more convincing, we have added an alternative statistical method to quantify the establishment of iXCI, i.e., the percentage of H3K27me3-positive and -negative trophoblast cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not (Figure 3F; Lines 243-244 in the revised manuscript)

      - Figure 2 figure supplement 2A: relative expression of Dnmt3b?

      Thanks for your detailed reminder. The data represent the relative expression level of Dnmt3b, as noted in the original figure legend. Based on your comments, we have added the gene name in the label of the Y-axis. Similarly, the protein name has been also added to make the results more informative (Figure 2 figure supplement 2A, C, E in the revised manuscript).

      - Figure 2 figure supplement 2B/C: in the text, line 153, it is stated that "Dnmt3b mRNA and protein levels were significantly reduced in morulae, but not in blastocysts compared to those of negative control (NC) group". These figures do not support that statement. The IF images show a loss of DNMT3B in the Dnmt3b KD blastocysts. The IF quantification seems to have fewer datapoints for the blastocyst, and looking at the bar graphs, there seems to be a trend towards reduced DNMT3B in both the morula and blastocyst, which would also explain the reduction in DNA methylation in both stages as shown in Figure 2 figure supplement 2D/E.

      Thanks so much for your careful reviewing that makes our statements more accurate. We have rewritten the sentence in the revised manuscript as follows: Dnmt3b mRNA and protein levels were significantly reduced in morulae, and tended to be lower in blastocysts compared to those of the negative control (NC) group. In addition, we have removed “transient” from the original statement “The transient inhibition of Dnmt3b” (Lines 168-170 in the revised manuscript).

      - Figure 2 figure supplement 2F/G: include representative IF images with separation of all channels and the merged image.

      Thank you for your suggestion. We have added the representative immunofluorescence (IF) images with separate channels and merged image in the revised manuscript (Figure 3—figure supplement 3B, F in the revised manuscript).

      - Figure 2 figure supplement 2H: Instead of showing log2FC in methylation levels, delta methylation would be more informative. Are these genes already inactivated at the 8-cell stage? Or are they active and become inactivated by the gain in DNA methylation? Doing qPCR for these genes, or looking at published RNAseq data would be informative. What happens to the expression of these genes in the Dnmt3b KD?

      Thanks for your suggestions. We have represented DNA methylation changes as “ΔDNA methylation”. During mouse preimplantation development, iXCI is initiated in earlier cleavage female embryos dependent on Xist upregulation around 4-8-cell stage, and then Xist specifically coats paternal X chromosome and finally leads to chromosome-wide silencing via heterochromatinization in early blastocysts. Thus, these non-escaping genes, which are subject to XCI, would not be inactivated at 8-cell stage

      Author response image 1.

      The processes of iXCI initiation and establishment (left panel), and dynamics of total expression levels of X chromosome in male and female preimplantation embryos (right panel, note that X-dosage is balanced between sexes until the early blastocyst stage).

      As you expected, most of these representative non-escaping is downregulated upon the transition of 8-cell to blastocyst stage, consistent with their gain of DNA methylation. Additionally, since preimplantation iXCI status maintains extraembryonic cells (Galupa and Heard, 2015; Schulz and Heard, 2013), we further reanalyzed the published RNA-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent DNA methyltransferase knockout (Chen et al., 2019). The results showed that chromosome-wide loss of DNA methylation led to a chromosome-wide transcriptional upregulation, including the locus of these non-escaping genes, on paternal X chromosome. We have added this result in the revised manuscript (Figure 3—figure supplement 3J; Figure 3—figure supplement 4A-B; Lines 253-261 in the revised manuscript).

      Figure 3:

      - Figure 3 figure supplement 1: representative IF image missing.

      Thanks for your kind reminder. We have added the representative IF images in the revised manuscript to provide a clearer illustration of the data (Figure 4—figure supplement 1A in the revised manuscript).

      - Figure 3 figure supplement 2B: scales are missing for the H3K27me3 ChIP-seq data (are the 8-cell and ICM tracks set to the same scale?). It looks like the ICM track is cut off at the top (peaks not fully displayed) and the data looks very sparse. A more informative analysis would be to do peak calling over promoters and compare 8-cell with ICM.

      Thanks for your detailed reminder. We apologize for the missing of scale bars in the H3K27me3 ChIP-seq data. The 8-cell and ICM tracks were set to the same scale, and we have now added scales to the figure in the revised manuscript to improve the result presentation. As you have speculated, the visual effect of the flatted peak is not caused by track cutting off, but rather by zooming into a specific region in the extended IGV files.

      These results are based on the reanalysis of publicly available data of pooled embryos, which just provided suggestive but not direct evidence to support the role of DNA methylation in promoting X-linked H3K27me3 enrichment in iXCI.

      To provide more convincing evidence. we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 female embryos that underwent Dnmt3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that Dnmt knockout led to a nearly complete loss of H3k27me3 on paternal (specifically inactivated in iXCI), which showed a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome (Figure 3—figure supplement 4 in the revised manuscript). We have added these results in the revised manuscript.

      - Figure 3E: Given all tested proteins give a positive signal, it would have been good to include a negative control chromatin protein that is known to not interact with DNMT3B. Given both PRC2 and DNMT3B are chromatin-binding proteins, can the signal be a result of close proximity instead of a direct interaction?

      In the present study, to test the interaction between DNMT3B and PRC2 core components, we have used in situ proximity ligation assay (PLA), an increasingly popular technique for detecting the close proximity of two proteins in fixed samples using two primary antibodies (Alsemarz et al., 2018).

      Author response image 2.

      Schematic diagram of the principle of the in situ PLA.

      Compared with classical co-Immunoprecipitation (Co-IP) method, in situ PLA has advantages in (1) detecting low input samples or proteins expressed at low levels, which is extremely difficult using Co-IP; (2) providing in situ or subcellular information of protein-protein interaction. However, it should be noted that the maximal distance allowing this reaction is 40 nm, which is not quite small enough to demonstrate a physical interaction between the two antigens, but sufficient to support a very close “proximity”.

      In our study, in situ PLA, including the experimental design of negative control, was performed in the accordance with the manufacturer’s instruction of Duolink® In Situ Red Starter Kit (MilliporeSigma): “Technical negative controls included incubation with each primary antibody separately and no primary antibody”. We have refined the relevant sentence in the revised manuscript (Lines 308-310 in the revised manuscript)

      - Figure 3G: It would have been good to include a negative control, and DNase/benzonase to exclude DNA/RNA-mediated protein interaction.

      - (Of note, there have been previous studies reporting an interaction between PRC2 and DNMT3B in other cell types, such as in Weigert et al. 2023, but unfortunately, they don't seem to use DNase/benzonase either).

      The Co-IP analysis of DNMT3B and PRC2 core components in differentiated female ES cells was presented as additional supportive evidence. Because the Co-IP analysis is extremely difficult for preimplantation embryos, we have used in situ PLA to detect their interaction. However, the maximal distance allowing in situ PLA reaction is 40 nm, which is not quite small enough to demonstrate a physical interaction (Alsemarz et al., 2018). Thus, we have added a Co-IP analysis using differentiated female ES cells, in which rXCI occurs upon the differentiation.

      Based on this consideration of the importance and contribution of this result, we have moved this result from the main figure, to the supplemental figure (Figure 4—figure supplement 3H in the revised manuscript).

      - Figure 3 figure supplement 3G: what were the ESCs differentiated into? Did the Dnmt3b KO or Dnmt3a/b DKO show any differentiation defect?

      The mouse ESC line PGK12.1 was a well-established ex vivo model of rXCI. Under the standard culture condition, PGK12.1 is normally fated to neuroectodermal commitment.

      Author response image 3.

      Immunostaining of NESTIN, a neuroectodermal stem cell marker molecule, and NANOG in undifferentiated and differentiated PGK12.1 ESCs respectively.

      No differentiation defects have been observed in either Dnmt3b KO or Dnmt3a/3b DKO ESCs in our study. Dnmt KO/DKO/TKO ES cell lines have been successfully used as the model of interaction of DNA methylation and H3K27me3 deposition (King et al., 2016).

      Figure 4:

      - Figure 4B: Is there an explanation for seeing similar total cell numbers in Figure 4B, but showing decreased proliferation in Figure 4A?

      Thank you for your insightful comments. The EdU cell proliferation assays labels cells during the S phase of cell cycle, as the 5-ethynyl 2´-deoxyuridine (EdU) is incorporated into newly synthesized DNA. This labeling identifies cells undergoing DNA synthesis, but these cells may not have completed mitosis at the time of detection. As a result, the total cell number may not immediately reflect the decrease in proliferation observed in the treated group. To address this point, we have rewritten the sentences in the revised manuscript (Lines 174-175 in the revised manuscript).

      References

      Alsemarz, A., Lasko, P. and Fagotto, F. J. B. (2018). Limited significance of the in situ proximity ligation assay. bioRxiv, 411355.

      Auclair, G., Guibert, S., Bender, A. and Weber, M. (2014). Ontogeny of CpG island methylation and specificity of DNMT3 methyltransferases during embryonic development in the mouse. Genome Biol. 15, 545.

      Balaton, B. P. and Brown, C. J. (2021). Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation. Epigenetics Chromatin 14, 30.

      Bartke, T., Vermeulen, M., Xhemalce, B., Robson, S. C., Mann, M. and Kouzarides, T. (2010). Nucleosome-interacting proteins regulated by DNA and histone methylation. Cell 143, 470-484.

      Borgel, J., Guibert, S., Li, Y., Chiba, H., Schubeler, D., Sasaki, H., Forne, T. and Weber, M. (2010). Targets and dynamics of promoter DNA methylation during early mouse development. Nat. Genet. 42, 1093-1100.

      Chen, Z., Yin, Q., Inoue, A., Zhang, C. and Zhang, Y. (2019). Allelic H3K27me3 to allelic DNA methylation switch maintains noncanonical imprinting in extraembryonic cells. Sci Adv 5, eaay7246.

      Chiba, H., Hirasawa, R., Kaneda, M., Amakawa, Y., Li, E., Sado, T. and Sasaki, H. (2008). De novo DNA methylation independent establishment of maternal imprint on X chromosome in mouse oocytes. Genesis 46, 768-774.

      Choi, S. H., Heo, K., Byun, H. M., An, W., Lu, W. and Yang, A. S. (2011). Identification of preferential target sites for human DNA methyltransferases. Nucleic Acids Res. 39, 104-118.

      Chow, J. and Heard, E. (2009). X inactivation and the complexities of silencing a sex chromosome. Curr. Opin. Cell Biol. 21, 359-366.

      Dahlet, T., Argueso Lleida, A., Al Adhami, H., Dumas, M., Bender, A., Ngondo, R. P., Tanguy, M., Vallet, J., Auclair, G., Bardet, A. F., et al. (2020). Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat Commun 11, 3153.

      Duymich, C. E., Charlet, J., Yang, X. J., Jones, P. A. and Liang, G. N. (2016). DNMT3B isoforms without catalytic activity stimulate gene body methylation as accessory proteins in somatic cells. Nat Commun 7, 11453.

      Fukuda, A., Tomikawa, J., Miura, T., Hata, K., Nakabayashi, K., Eggan, K., Akutsu, H. and Umezawa, A. (2014). The role of maternal-specific H3K9me3 modification in establishing imprinted X-chromosome inactivation and embryogenesis in mice. Nat Commun 5, 5464.

      Galupa, R. and Heard, E. (2015). X-chromosome inactivation: new insights into cis and trans regulation. Curr. Opin. Genet. Dev. 31, 57-66.

      Gontan, C., Mira-Bontenbal, H., Magaraki, A., Dupont, C., Barakat, T. S., Rentmeester, E., Demmers, J. and Gribnau, J. (2018). REX1 is the critical target of RNF12 in imprinted X chromosome inactivation in mice. Nat Commun 9, 4752.

      Hagarman, J. A., Motley, M. P., Kristjansdottir, K. and Soloway, P. D. (2013). Coordinate regulation of DNA methylation and H3K27me3 in mouse embryonic stem cells. PLoS One 8, e53880.

      Heard, E., Chaumeil, J., Masui, O. and Okamoto, I. (2004). Mammalian X-chromosome inactivation: an epigenetics paradigm. Cold Spring Harb. Symp. Quant. Biol. 69, 89-102.

      Huynh, K. D. and Lee, J. T. (2005). X-chromosome inactivation: a hypothesis linking ontogeny and phylogeny. Nat. Rev. Genet. 6, 410-418.

      Inoue, A., Jiang, L., Lu, F. and Zhang, Y. (2017). Genomic imprinting of Xist by maternal H3K27me3. Genes Dev. 31, 1927-1932.

      Inoue, K., Kohda, T., Sugimoto, M., Sado, T., Ogonuki, N., Matoba, S., Shiura, H., Ikeda, R., Mochida, K., Fujii, T., et al. (2010). Impeding Xist expression from the active X chromosome improves mouse somatic cell nuclear transfer. Science 330, 496-499.

      Jermann, P., Hoerner, L., Burger, L. and Schubeler, D. (2014). Short sequences can efficiently recruit histone H3 lysine 27 trimethylation in the absence of enhancer activity and DNA methylation. Proc. Natl. Acad. Sci. U. S. A. 111, E3415-3421.

      King, A. D., Huang, K., Rubbi, L., Liu, S., Wang, C. Y., Wang, Y., Pellegrini, M. and Fan, G. (2016). Reversible Regulation of Promoter and Enhancer Histone Landscape by DNA Methylation in Mouse Embryonic Stem Cells. Cell Rep. 17, 289-302.

      Maslov, A. Y., Lee, M., Gundry, M., Gravina, S., Strogonova, N., Tazearslan, C., Bendebury, A., Suh, Y. and Vijg, J. (2012). 5-aza-2'-deoxycytidine-induced genome rearrangements are mediated by DNMT1. Oncogene 31, 5172-5179.

      Oikawa, M., Inoue, K., Shiura, H., Matoba, S., Kamimura, S., Hirose, M., Mekada, K., Yoshiki, A., Tanaka, S., Abe, K., et al. (2014). Understanding the X chromosome inactivation cycle in mice: a comprehensive view provided by nuclear transfer. Epigenetics-Us 9, 204-211.

      Oka, M., Meacham, A. M., Hamazaki, T., Rodic, N., Chang, L. J. and Terada, N. (2005). De novo DNA methyltransferases Dnmt3a and Dnmt3b primarily mediate the cytotoxic effect of 5-aza-2'-deoxycytidine. Oncogene 24, 3091-3099.

      Pintacuda, G. and Cerase, A. (2015). X Inactivation Lessons from Differentiating Mouse Embryonic Stem Cells. Stem Cell Rev Rep 11, 699-705.

      Schulz, E. G. and Heard, E. (2013). Role and control of X chromosome dosage in mammalian development. Curr. Opin. Genet. Dev. 23, 109-115.

      Smith, Z. D., Chan, M. M., Mikkelsen, T. S., Gu, H. C., Gnirke, A., Regev, A. and Meissner, A. (2012). A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 484, 339-344.

      Tada, T., Obata, Y., Tada, M., Goto, Y., Nakatsuji, N., Tan, S., Kono, T. and Takagi, N. (2000). Imprint switching for non-random X-chromosome inactivation during mouse oocyte growth. Development 127, 3101-3105.

      Tan, K., An, L., Miao, K., Ren, L., Hou, Z., Tao, L., Zhang, Z., Wang, X., Xia, W., Liu, J., et al. (2016). Impaired imprinted X chromosome inactivation is responsible for the skewed sex ratio following in vitro fertilization. Proc. Natl. Acad. Sci. U. S. A. 113, 3197-3202.

      Vire, E., Brenner, C., Deplus, R., Blanchon, L., Fraga, M., Didelot, C., Morey, L., Van Eynde, A., Bernard, D., Vanderwinden, J. M., et al. (2006). The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439, 871-874.

      Zhao, J., Yao, K., Yu, H., Zhang, L., Xu, Y., Chen, L., Sun, Z., Zhu, Y., Zhang, C., Qian, Y., et al. (2021). Metabolic remodelling during early mouse embryo development. Nat Metab 3, 1372-1384.

    1. eLife Assessment

      This is a valuable study of the role that life history differences might play in determining population size and demography. While concerns about generation times and population structure leave the evidence for the claims in parts incomplete, the work is of considerable interest to anyone who tries to understand evolutionary consequences of life history changes.

    2. Joint Public Review:

      Summary:

      This interesting study applies the PSMC model to a set of new genome sequences for migratory and nonmigratory thrushes and seeks to describe differences in the population size history among these groups. The authors create a set of summary statistics describing the PSMC traces - mean and standard deviation of Ne, plus a set of metrics describing the shape of the oldest Ne peak - and use these to compare across migratory and resident species (taking single samples sequenced here as representative of the species). The analyses are framed as supporting or refuting aspects of a biogeographic model describing colonization dynamics from tropical to temperate North and South America.

      Strengths:

      * This is a creative use of PSMC to test explicit a priori hypotheses about season migration and Ne. The PSMC analyses seem well done and the authors acknowledge much of the complexity of interpretation in the discussion.

      * We appreciate the test-of-hypothesis design of the study and the explicit formulation of three main expectations to test. The data analysis has been done with appropriate available tools.

      Key weaknesses from the original round of review:

      * Short of developing some novel theory deep in the PSMC model, I think readers would need to see simulations showing that the analyses employed in this paper are capable of supporting or refuting their biogeographic hypothesis before viewing them as strongly supporting a specific biogeographic model. Tools like msprime and stdpopsim can be used to simulate genome-scale data with fairly complex biogeographic models. Running simulations of a thrush-like population under different biogeographic scenarios and then using PSMC to differentiate those patterns would be a more convincing argument for the biogeographic aspects of this paper. The other benefit of this approach would be to nail down a specific quantitative version of the taxon cycles model referenced in the abstract, and it would allow the authors to better study and explain the motivation behind the specific summary statistics they develop for PSMC posthoc analysis.

      * The authors hypothesized that the wider realized breeding and ecological range characterising migrants versus resident lineages could be a major drive for increased effective population size and population expansion in migrants versus residents. I understand that this pattern (wider range in migrants) is a common characteristic across bird lineages and that it is viewed as a result of adapting to migration. A problem that I see in their dataset is that the breeding grounds range of the two groups are located in very different geographic areas (mainly South versus North America). The authors could have expanded their dataset to include species whose breeding grounds are from the two areas, regardless of their migratory behaviour, as a comparison to disentangle whether ecological differences of these two areas can affect the population sizes or growth rates.

      * As I understand from previous literature, the time-scale to population growth and estimates of effective population sizes considered in the present paper for the resident versus migratory clades seem to widely predate the times to speciation for the same lineages, which were reported in previous work of the same authors (Everson et al 2019) and others (Termignoni-Garcia et al 2022). This piece of information makes the calculation of species-specific population size changes difficult to interpret in the light of lineages' comparison. It is unclear what the authors consider to be lineage-specific in these estimates, as the clades were likely undergoing substantial admixture during the time predating full isolation.

      * Regarding the methodological difficulties in interpreting the impact of population structure on the estimates of effective population sizes with the PSMC approach, I would think that performing simulations to compare different scenarios of different degrees of structured populations would have helped substantially understand some of the outcomes.

      * The authors use an average generation time for all taxa, but the citations imply generation time is known for at least some of them. Are there differences in generation time associated with migration? I am not a bird biologist, but quick googling suggests maybe this is the case? (https://doi.org/10.1111/1365-2656.13983). I think it important the authors address this, as differences in generation time I believe should affect estimates of Ne and growth.

      [Editors' note: the original reviews in full are here: https://elifesciences.org/reviewed-preprints/90848/reviews. The reviewers were not available to comment on the latest version of the submission.]

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents valuable finding regarding the role of life history differences in determining population size and demography. The evidence for the claims is still partially incomplete, with concerns about generation times and population structure. Nonetheless, the work will be of considerable interest to biologists thinking about the evolutionary consequences of life history changes.  

      Thank you. We have addressed the generation time and population structure issues in detail in our revision and hope that you, like us, find them to be of sufficiently low concern (i.e., they are not driving the results) that they do not overshadow the main findings and conclusions.

      The opportunity to make in-depth revisions also helped the manuscript in two ways unanticipated by both us and the reviewers. First, KW made a mistake in the original analysis of phylogenetic signal, and catching that error simplifies that aspect of the study (there is none in our measured variables). Second, in June 2024 Hilgers et al. (2024; https://doi.org/10.1101/2024.06.17.599025) posted an important manuscript to bioRxiv noting the possibility of false population size peaks in PSMC analyses using the standard default settings. Our results had three of those, which we have eliminated. N<sub>e</sub>ither of these issues affect the overall conclusions, but their resolution improves the work.  

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This interesting study applies the PSMC model to a set of new genome sequences for migratory and nonmigratory thrushes and seeks to describe differences in the population size history among these groups. The authors create a set of summary statistics describing the PSMC traces - mean and standard deviation of N<sub>e</sub>, plus a set of metrics describing the shape of the oldest N<sub>e</sub> peak - and use these to compare across migratory and resident species (taking single samples sequenced here as representative of the species). The analyses are framed as supporting or refuting aspects of a biogeographic model describing colonization dynamics from tropical to temperate North and South America. 

      Strengths: 

      At a technical level, the sequencing and analysis up through PSMC looks good and the paper is engaging and interesting to read as an introduction to some verbal biogeographic models of avian evolution in the Pleistocene.

      The core findings - higher and more variable N<sub>e</sub> in migratory species - seem robust, and the biogeographic explanation is plausible.  

      Thanks. We thought so as well. Our analyses go beyond being simply descriptive and test some simple hypotheses, including a biogeographic+ecological expansion opportunity gained in some lineages through the adoption of a seasonal migration life-history strategy.  

      Weaknesses: 

      I did not find the analyses particularly persuasive in linking specific aspects of clade-level PSMC patterns causally to evolutionary driving forces. To their credit, the authors have anticipated my main criticism in the discussion. This is that variation in population size inferred by methods like PSMC is in "effective" terms, and the link between effective and census population size is a morass of bias introduced by population structure and selection so robustly connecting specific aspects of PSMC traces to causal evolutionary forces is somewhere between extremely difficult and impossible.  

      As R1 notes, we do not attempt to link effective population sizes and census sizes (though we do discuss this), and we are also careful to discuss correlated rather than causative factors when going beyond the overarching hypotheses regarding life-history strategy.

      Population structure is the most obvious force that can generate large N<sub>e</sub> changes mimicking the census-sizefocused patterns the authors discuss. The authors argue in the discussion that since they focus on relatively deep time (>50kya at least, with most analyses focusing on the 5mya - 500kya range) population structure is "likely to become less important", and the resident species are usually more structured today (true) which might bias the findings against the observed higher N<sub>e</sub> in migrants.  

      To clarify, the patterns we discuss are entirely related to effective population size, not census size. But, yes, this is why we’ve given population structure its own section in the Discussion.

      But is structure really unimportant in driving PSMC results at these specific timescales? There is no numerical analysis presented to support the claim in this paper. The biogeographic model of increased temperate-latitude land area supporting higher populations could yield high N<sub>e</sub> via high census size, but shifts in population structure (for example, from one large panmictic population to a series of isolated refugial populations as a result of glaciation-linked climate changes) could plausibly create elevated and more variable N<sub>e</sub>. Is it more land area and ecological release leading to a bigger and faster initial N<sub>e</sub> bump, or is it changes in population connectivity over time at expanding range edges, or is the whole single-bump PSMC trace an artifact of the dataset size, or what? The authors have convinced me that the N<sub>e</sub> history of migratory thrushes is on average very different from nonmigrant thrushes, but beyond that it's unclear what exactly we've learned here about the underlying process.  

      We do not argue that population structure is unimportant, only that it is less important as one goes into deeper time. Further, we agree with the reviewer’s observation above that structure is more likely to bias nonmigrant estimates of N<sub>e</sub>. In other words, following Li & Durbin’s (2011) simulations, we interpret that an inflated N<sub>e</sub> due to structure should occur more often among residents. We have clarified this in the revision. We also agree that what we’ve learned about the underlying process is not entirely clear, but as we stated, population structure does not seem to be the main driver, and there is evidence that both biogeographic and ecological factors are involved. With this being the first time that these questions have been asked, we think we’ve made an important advance and that we’ve opened a number of avenues for future study.

      It also important to consider the time scales involved and the sampling regime. Glacial-interglacial cycles averaged ~100 Kyr back to 0.74 Mya and then averaged ~41 Kyr from then back to 2.47 Mya; about 50-60 of these cycles occurred (Lisiecki & Raymo 2005: fig. 4). This probably caused a lot of population structuring and mixing in these lineages. In addition, in the PSMC output from one of our lineages, C. ustulatus swainsonii, we find that there are 54 time segments sampled for the Pleistocene, indicating the inadequacy of this method to reflect fine-scale changes and suggesting that each estimate is capturing a lot of both phenomena, structuring and mixing. We have added this to the revision.

      I generally agree with the authors that "at present there is no way to fully disentangle the effects of population structure and geographic space on our results". But given that, I think there are two options - either we can fully acknowledge that oversimplified demographic models like PSMC cannot be interpreted as supporting evidence of any particular mechanistic or biogeographic hypothesis and stop trying to use them to do that, or we have to do our best to understand specifically which models can be distinguished by the analyses we're employing. 

      Short of developing some novel theory deep in the PSMC model, I think readers would need to see simulations showing that the analyses employed in this paper are capable of supporting or refuting their biogeographic hypothesis before viewing them as strongly supporting a specific biogeographic model. Tools like msprime and stdpopsim can be used to simulate genome-scale data with fairly complex biogeographic models. Running simulations of a thrush-like population under different biogeographic scenarios and then using PSMC to differentiate those patterns would be a more convincing argument for the biogeographic aspects of this paper. The other benefit of this approach would be to nail down a specific quantitative version of the taxon cycles model referenced in the abstract, and it would allow the authors to better study and explain the motivation behind the specific summary statistics they develop for PSMC posthoc analysis.  

      These could very well be fruitful pursuits for future work, but they are beyond the scope of this paper. The impossibility of reconstructing ranges through deep time makes anything other than the very general biogeographic hypothesis we’ve posed an uncertain pursuit. Also, a purely biogeographic approach neglects the likelihood of ecological expansion also being involved. We get at the importance of the latter in the “Geography and evolutionary ecology” section of the Discussion. Below, the editor states that discussions among reviewers indicate that simulations are not warranted at this time. We agree that the complexities involved are substantial, to the point of making direct relevance to this empirical study uncertain (especially in such an among-lineage context). Regarding taxon cycles, we merely point out that that conceptual framework seems relevant given our findings. This was not even remotely anticipated at the outset of the study, so we are reluctant to do anything more than point out its possible relevance in several aspects of the results. Finally, the motivation for the study’s summary statistics were entirely driven by the hypotheses, as given in Methods, and due to an earlier error (noted above), there are no post-hoc analyses in the revision. Sorry for the needless confusion.

      Reviewer #2 (Public Review): 

      Summary: 

      Winker and Delmore present a study on the demographic consequences of migratory versus resident behavior by contrasting the evolutionary history of lineages within the same songbird group (thrushes of the genus Catharus). 

      Strengths: 

      I appreciate the test-of-hypothesis design of the study and the explicit formulation of three main expectations to test. The data analysis has been done with appropriate available tools. 

      Weaknesses: 

      The current version of the paper, with the case study chosen, the results, and the relative discussion, is not satisfying enough to support or reject the hypotheses here considered.  

      Given the stated strengths, the weaknesses noted seem a little incongruous, but we understand from the comments below that the reviewer would like to see the study redesigned and expanded.  

      The authors hypothesized that the wider realized breeding and ecological range characterising migrants versus resident lineages could be a major drive for increased effective population size and population expansion in migrants versus residents. I understand that this pattern (wider range in migrants) is a common characteristic across bird lineages and that it is viewed as a result of adapting to migration. A problem that I see in their dataset is that the breeding grounds range of the two groups are located in very different geographic areas (mainly South versus North America). The authors could have expanded their dataset to include species whose breeding grounds are from the two areas, regardless of their migratory behaviour, as a comparison to disentangle whether ecological differences of these two areas can affect the population sizes or growth rates.

      Because the questions are about the migratory life history strategy and the best way to get at this is in a phylogenetic framework, we’re not sure how we could effectively add species “regardless of their migratory behavior.” Further, we know that migration causes lineages to experience variable ecological conditions that include breeding, migration, and wintering conditions. Obligate migrants are going to have different breeding ranges from their close relatives, and the more distantly related species are, the less likely it is that they respond to particular ecological conditions the same way. So we do not think that an approach that included miscellaneous species from northern and southern regions would strengthen this study. Here, the comparative framework of closely related lineages that possess or lack the trait of interest is a study design strength. We do agree, however, that future work is needed that does encompass more lineages (we would argue in a phylogenetic context), and that disentangling the effects of geography and ecology will also be an important future endeavor. 

      As I understand from previous literature, the time-scale to population growth and estimates of effective population sizes considered in the present paper for the resident versus migratory clades seem to widely predate the times to speciation for the same lineages, which were reported in previous work of the same authors (Everson et al 2019) and others (Termignoni-Garcia et al 2022). This piece of information makes the calculation of species-specific population size changes difficult to interpret in the light of lineages' comparison. It is unclear what the authors consider to be lineage-specific in these estimates, as the clades were likely undergoing substantial admixture during the time predating full isolation.  

      We do recognize that timing estimates vary among studies. Differences among studies in important variables like markers, methods, generation time, and mutation or substitution rates create much of this uncertainty. Also, we are not confident in prior dating efforts in this group, largely because of gene flow and its effects on bringing estimates closer to the present. As we point out (line 485), differences among studies on these issues do not detract from the strengths here for within-study, among-lineage contrasts. In short, the timing could be off in an among-study context (and likely is with prior work, given gene flow), but relative performance of among-lineage N<sub>e</sub> differences is less susceptible to these factors. This was shown fairly well in Li & Durbin’s initial use of the method among human populations. Regarding substantial admixture, PSMC curves often unite at their origins with sister lineages (when they were the same lineage). A good example is with the two C. guttatus E & W curves in Fig. S3, which still have substantial gene flow today (they are subspecies and in contact), yet they show remarkably different N<sub>e</sub> curves through their history. It is not possible to mark a cutoff point for each lineage that represents the cessation of admixture with another lineage (e.g., Everson et al. 2019 showed substantial admixture between three full species in this group); that period can be very long (Price et al. 2008), varies among lineages, and will not be available for deeper lineage divergences in the phylogeny. We therefore chose to use all of the time intervals retrievable from the genomic data in each lineage, considering that this uniform treatment is the best approach for our among-lineage comparison. And note that we were careful to label these as “the lineages’ PSMC inception” (line 190).  

      Regarding the methodological difficulties in interpreting the impact of population structure on the estimates of effective population sizes with the PSMC approach, I would think that performing simulations to compare different scenarios of different degrees of structured populations would have helped substantially understand some of the outcomes.  

      The complexities of such modeling in a system like this are daunting. The different degrees of structuring among all of these lineages across just a single glacial-interglacial cycle would necessitate a lot of guesswork; projecting that back across 50-60 such cycles just in the Pleistocene would probably end up being fiction. Disentangling the effects of structure versus changes in N<sub>e</sub> in a system like this would probably not be possible with that approach and these data. As noted above and below, there was agreement among reviewers and the editor that simulations in this case are not warranted for revision. We have added the nature of the glacialinterglacial cycles and the PSMC sampling time segments to help readers understand this better (see above in response to R1, and lines 272-278).

      Additionally, I have struggled to understand if migratory behaviour in birds is considered to be acquired to relieve species competition, or as a consequence of expanded range (i.e., birds expand their range but their feeding ground is kept where speciation occurred as to exploit a ground with higher quality and abundance of seasonal local resources).  

      The origins of migration have been a struggle for researchers since the subject was taken up. But how the trait was acquired among these species does not really matter for our study. Here, migratory lineages possess different biogeographic+ecological attributes than their close relatives that are sedentary. Our focus is on the presence and absence of this life-history trait.

      The points raised above could be considered to improve the current version of the paper. 

      Thank you. We appreciate the opportunity to guide our revision using your comments.  

      Reviewer #3 (Public Review): 

      Summary: 

      This paper applies PSMC and genomic data to test interesting questions about how life history changes impact long-term population sizes. 

      Strengths: 

      This is a creative use of PSMC to test explicit a priori hypotheses about season migration and N<sub>e</sub>. The PSMC analyses seem well done and the authors acknowledge much of the complexity of interpretation in the discussion. 

      Weaknesses: 

      The authors use an average generation time for all taxa, but the citations imply generation time is known for at least some of them. Are there differences in generation time associated with migration? I am not a bird biologist, but quick googling suggests maybe this is the case (https://doi.org/10.1111/1365-2656.13983). I think it important the authors address this, as differences in generation time I believe should affect estimates of N<sub>e</sub> and growth.  

      Good point. The study cited by the reviewer encompasses a much higher degree of variation in body size and thus generation time. Differences in generation time in similarly sized close relatives, as in our study, should be small, and our approach has been to average those that are known. Unfortunately, generation times are not known for all of these species, but given their similarity in size we can have reasonable confidence in their being similar. We used data from the life-history research available (as cited) to obtain our average; there are not appropriate data for the residents, though. However, there is thought to be a generation time cost to seasonal migration in birds, and Bird et al. (2020) included this in their estimates to provide modeled values for all of the lineages we studied. We’re leery of using modeled values where good data for the nonmigrants in this group don’t exist (and the basis for quantifying this cost is tiny), but we recognize that this second approach is available and could leave some doubt in our results if not pursued. So we re-did everything with the modeled generation times of Bird et al. (2020). As expected, most of the differences are time-related. Importantly, our overall results are not different. We present them as Table S2 and have added the details on this to the Methods.

      The writing could be improved, both in the introduction for readers not familiar with the system and in the clarity and focus of the discussion.  

      We have added a phylogeny (new Fig. 1) to help readers better understand the system, and we’ve re-worked the Discussion to make it clearer what is clarified by our results and what remains unclear.  

      Recommendations for the authors:

      Reviewing Editor comment: 

      I note that discussion among the reviewers made clear that simulations are probably not the right answer given the complexity of the modeling required.  

      We appreciate this conclusion, with which we agree.  

      Reviewer #2 (Recommendations For The Authors): 

      Apologies for the delay with the review, which came at a very busy time. I hope you will find my comments helpful.

      Thanks. Your comments are helpful, and we fully understand how reviews (and our revisions!) have to wait until more pressing needs are addressed.

      I enjoyed reading the manuscript but I believe that the discussion sections could be heavily rewritten for better clarity. The discussion is sometimes redundant and lacks some flow/clarity. In a nutshell, I had the feeling that a bit of everything is thrown in the discussion but clear conclusions are not made.  

      Yes, the Discussion has been difficult to write, because more issues arose in the Results than we anticipated at the outset. We feel that discussing them is relevant, but we agree that much remains unclear. This coupling of paleodemographics with geography and ecology is a new area, which opens some important new (and relevant) areas to consider. So clarity is not possible in some areas. We’ve revised to point out where we do have clarity (e.g., in migrant lineages having different paleodemographic attributes than nonmigrants) and where only further study can provide clarity (e.g., in the roles of geography versus ecology). The journal format does not seem to have secondary subheaders, but we’ve used bold in one place to highlight ‘ecological mechanisms’ to offset that section, one of the more complex. We’ve also added a paragraph in the conclusions to clarify where we have clear takeaways and where uncertainties remain. 

      Reviewer #3 (Recommendations For The Authors): 

      The introduction should engage the reader with biology, not the use of demographic methods or genomics (both of which have been around for more than a decade). I would drop the first paragraph and considerably expand the second. What has previous research on ecology/behavior/genetics found regarding the demographic effects of seasonal migration?

      There are two important aspects to our study: 1) using paleodemographic methods to test hypotheses about adoption of a major life-history trait—an important biological question regardless of system, and so far (surprisingly) unaddressed; and 2) using this novel approach to study the effects of one such trait, seasonal migration. At these timescales, nothing exists on this subject, so there is really nothing to expand with. If there is relevant literature that we’ve missed, we’d be happy to add it.

      What is the missing bit of information or angle the current study addresses (other than just doing it larger and fancier with genomics)?  

      The effects of major life-history traits on paleodemographics has not been addressed before, to our knowledge. The whole context is new, so we’re not doing something “larger and fancier” with genomics. We are doing something that has not been done before: testing hypotheses about the effects of a major life-history trait on population sizes in evolutionary time. We’re not sure how this can be made clearer. To us this seems like a very engaging biological question with wide applicability. We hope that this study is just the first of many to come, in a diversity of biological systems.

      A figure showing the phylogenetic relationships of these taxa which are migratory would help the reader immensely. Although this is shown in Fig S3 I think it might be nice to have a map of the species and their ranges alongside a phylogeny as a main figure early on.  

      Thank you. This is a good suggestion. We can’t fit a phylogeny and all the distribution maps (Fig. S1) onto a page, but we can include a phylogeny as one of the main figures with nonmigrants highlighted. We’ve inserted this as a new Fig. 1. 

      If I understand correctly, the authors' arguments for why migratory species should show more growth hinge on large range size and geographic expansion. Yet they argue in the discussion that these forces are unlikely to be important (L226). I found the discussion on this confusing (e.g. L231 then says maybe it does matter). I think more clarity here would be helpful.

      Our argument and predictions are based both on geographic and ecological expansion. This was clearly stated as our third prediction “3) early population growth would be higher as seasonal migration opens novel ecological and geographic space…” We have gone back through and reiterated the coupling of these two factors. The line mentioned concludes the first paragraph in the section ‘Geography and evolutionary ecology,’ which focuses on the difficulty of decoupling these in this system. As the paragraph relates, geography alone does not seem to be driving our results (we do not argue that it is unimportant). 

      I also would have liked more time in the discussion addressing why variation in N<sub>e</sub> may be higher in migratory lineages.

      In addition to re-clarifying this in the Introduction, we have touched back on this now at line 221: “We attribute the higher variation in N<sub>e</sub> among migrants to be the result of the relative instability of northern biomes compared with tropical ones through glacial-interglacial cycles (e.g., Colinvaux et al., 2000; Pielou, 1991).”

      Minor comments: 

      L 62: Presumably PSMC is limited by the coalescent depth of the genelaogy, which may be younger or older than population "origins" depending on the history of colonization, lineage splitting, gene flow, etc.  

      We were careful to phrase these as “the lineages’ PSMC inception” (line 190), and responded to this issue in more detail above in response to R2’s public review. 

      L 338: I think a few more details on PSMC would be helpful. Was no maskfile used?  

      We did not use a maskfile, choosing instead to generate data of decent coverage and aligning reads to a single closely related relative. 

      Did the consensus fasta include all species?  

      No, we used a single reference high-quality fasta of Catharus ustulatus , as reported (lines 434-37). We have added that “Identical treatment of all lineages in these respects should provide a strong foundation for a comparative study like this among close relatives.” 

      L 361: Fair to assume the authors used a weighted average of N<sub>e</sub> from the output, rather than just averaging the N<sub>e</sub> values from each time segment?  

      No – we used all the values of N<sub>e</sub> produced by PSMC output. The PSMC method uses nonoverlapping portions of the genome in its analyses (which we’ve added to make that clear), and portions in juxtaposition will often provide data for very different periods in the time segments. Further, time segments are uneven within and among taxa, so it is not clear how a uniform and comparable weighting scheme could be implemented. We consider a uniform approach to be of primary importance, including for future comparisons among studies. 

      L 383 "delta" typo

      Thank you for catching this.

      L 93: I'd be tempted to present the questions (how does seasonal migration affect population size trajectory, means, and variation) and rationale before presenting the hypotheses. I found myself reading the hypotheses and wondering "why?"  

      We’ve tried this change in the revision. It makes the hypotheses a little harder to pull out (they are no longer numbered in a short sequence), but it is shorter and solves this concern.  

      L 337 read depth is usually expressed as X (e.g. "23X") rather than bp.

      Changed.

    1. eLife Assessment

      This fundamental study further validates DNAH12 as a causative gene for asthenoteratozoospermia and male infertility in both humans and mice. Compelling evidence supports the notion that DNAH12 is essential for proper axonemal development. This work will be of interest to reproductive biologists studying spermatogenesis and sperm biology, as well as andrologists focusing on male fertility.

    2. Reviewer #1 (Public review):

      Summary:

      Even though this is not the first report that the mutation in the DNAH12 gene causes asthenoteratozoospermia, the current study explores the sperm phenotype in-depth. The authors show experimentally that the said mutation disrupts the proper axonemal arrangement and recruitment of DNALI1 and DNAH1 - proteins of inner dynein arms. Based on these results, the authors propose a functional model of DNAH12 in proper axonemal development. Lastly, the authors demonstrate that the male infertility caused by the studies mutation can be rescued by ICSI treatment at least in the mouse. This study furthers our understanding of male infertility caused by a mutation of axonemal protein DNAH12, and how this type of infertility can be overcome using assisted reproductive therapy.

      Strengths:

      This is an in-depth functional study, employing multiple, complementary methodologies to support the proposed working model.

      Weaknesses:

      The structure and interaction model between DNAH12, DNALI1, and DNAH1 relies on in silico methodologies, and further studies are required to validate these predictions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors first conducted whole exome sequencing for infertile male patients and families where they co-segregated the biallelic mutations in the Dynein Axonemal Heavy Chain 12 (DNAH12) gene. Sperm from patients with biallelic DNAH12 mutations exhibited a wide range of morphological abnormalities in both tails and heads, reminiscing a prevalent cause of male infertility, asthenoteratozoospermia. To deepen the mechanistic understanding of DNAH12 in axonemal assembly, the authors generated two distinct DNAH12 knockout mouse lines via CRISPR/Cas9, both of which showed more severe phenotypes than observed in patients. Ultrastructural observations and biochemical studies revealed the requirement of DNAH12 in recruiting other axonemal proteins and that the lack of DNAH12 leads to the aberrant stretching in the manchette structure as early as stage XI-XII. At last, the authors proposed intracytoplasmic sperm injection as a potential measure to rescue patients with DNAH12 mutations, where the knockout sperm culminated in the blastocyst formation with a comparable ratio to that in WT.

      Strengths:

      The authors convincingly showed the importance of DNAH12 in assembling cilia and flagella in both human and mouse sperm. This study is not a mere enumeration of the phenotypes, but a strong substantiation of DNAH12's essentiality in spermiogenesis, especially in axonemal assembly.

      The analyses conducted include basic sperm characterizations (concentration, motility), detailed morphological observations in both testes and sperm (electron microscopy, immunostaining, histology), and biochemical studies (co-immunoprecipitation, mass-spec, computational prediction). Molecular characterizations employing knockout animals and recombinant proteins beautifully proved the interactions with other axonemal proteins.

      Many proteins participate in properly organizing flagella, but the exact understanding of the coordination is still far from conclusive. The present study gives the starting point to untangle the direct relationships and order of manifestation of those players underpinning spermatogenesis. Furthermore, comparing flagella and trachea provides a unique perspective that attracts evolutional perspectives.

      Weaknesses:

      Seemingly minor, but the discrepancies found in patients and genetically modified animals were not fully explained. For example, both knockout mice vastly reduced the count of sperm in the epididymis and the motility, while phenotypes in patients were rather milder. Addressing the differences in the roles that the orthologs play in spermatogenesis would deepen the comprehensive understanding of axonemal assembly.

      Comments on revisions:

      The reviewer is satisfied with the authors' response.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study further validates DNAH12 as a causative gene for asthenoteratozoospermia and male infertility in humans and mice. The data supporting the notion that DNAH12 is required for proper axonemal development are generally convincing, although more experiments would solidify the conclusions. This work will interest reproductive biologists working on spermatogenesis and sperm biology, as well as andrologists working on male fertility.

      We thank the editor and the two reviewers for their time and careful evaluation of our manuscript. We sincerely appreciate their encouraging feedback and insightful guidance on improving our study. In the revised manuscript, we have performed additional experiments and provided quantitative data regarding the reviewers' comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Even though this is not the first report that the mutation in the DNAH12 gene causes asthenoteratozoospermia, the current study explores the sperm phenotype in-depth. The authors show experimentally that the said mutation disrupts the proper axonemal arrangement and recruitment of DNALI1 and DNAH1 - proteins of inner dynein arms. Based on these results, the authors propose a functional model of DNAH12 in proper axonemal development. Lastly, the authors demonstrate that the male infertility caused by the studies mutation can be rescued by ICSI treatment at least in the mouse. This study furthers our understanding of male infertility caused by a mutation of axonemal protein DNAH12, and how this type of infertility can be overcome using assisted reproductive therapy.

      Strengths:

      This is an in-depth functional study, employing multiple, complementary methodologies to support the proposed working model.

      Thank you for your recognition of the strength of this study. Your positive feedback motivates us to continue refining our research and methodological rigor in future studies.

      Weaknesses:

      The study strength could be increased by including more controls such as peptide blocking of the inhouse raised mouse and rat DNAH12 antibodies, and mass spectrometry of control IP with beads/IgG only to exclude non-specific binding. Objective quantifications of immunofluorescence images and WB seem to be missing. At least three technical replicates of western blotting of sperm and testis extracts could have been performed to demonstrate that the decrease of the signal intensity between WT and mutant was not caused by a methodological artifact.

      Thank you for your comments. In order to study in-depth, we have analyzed the protein sequence features of DNAH12 protein, 1-200 amino acids of DNAH12 were selected as the ideal antigen considering its good performance (1. high immunogenicity; 2. High hydrophilicity; 3. Good Surface Leakage Groups; 4. Sequence homology analysis to avoid unspecific recognition to other proteins;). The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have tried to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were discard after the service. Luckily, we have got the target band of DNAH12 protein in western blotting experiment while the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, we have tested that the inhouse raised rabbit antibody were suitable for IP experiment. The IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies.  In IP assay, we have added the IgG group in the IP-mass spectrometry to exclude non-specific binding. And the experimental design was described in Figure 6B. The raw data were deposited in iProX partner repository (accession number: PXD051681), and we have coordinated with the repository manager to make the data publicly accessible (https://www.iprox.cn/page/subproject.html?id=IPX0008674001).  

      Besides, we have conducted replicates of western blotting of sperm and testis extracts at least 3 times and added the objective quantifications of immunofluorescence signals and WB images. The quantifications of the blot were shown in figures to help readers understand these results easily.

      Reviewer #2 (Public Review):

      Summary:

      The authors first conducted whole exome sequencing for infertile male patients and families where they co-segregated the biallelic mutations in the Dynein Axonemal Heavy Chain 12 (DNAH12) gene.

      Sperm from patients with biallelic DNAH12 mutations exhibited a wide range of morphological abnormalities in both tails and heads, reminiscing a prevalent cause of male infertility, asthenoteratozoospermia. To deepen the mechanistic understanding of DNAH12 in axonemal assembly, the authors generated two distinct DNAH12 knockout mouse lines via CRISPR/Cas9, both of which showed more severe phenotypes than observed in patients. Ultrastructural observations and biochemical studies revealed the requirement of DNAH12 in recruiting other axonemal proteins and that the lack of DNAH12 leads to the aberrant stretching in the manchette structure as early as stage XI-XII. At last, the authors proposed intracytoplasmic sperm injection as a potential measure to rescue patients with DNAH12 mutations, where the knockout sperm culminated in the blastocyst formation with a comparable ratio to that in WT.

      Strengths:

      The authors convincingly showed the importance of DNAH12 in assembling cilia and flagella in both human and mouse sperm. This study is not a mere enumeration of the phenotypes, but a strong substantiation of DNAH12's essentiality in spermiogenesis, especially in axonemal assembly.

      The analyses conducted include basic sperm characterizations (concentration, motility), detailed morphological observations in both testes and sperm (electron microscopy, immunostaining, histology), and biochemical studies (co-immunoprecipitation, mass-spec, computational prediction). Molecular characterizations employing knockout animals and recombinant proteins beautifully proved the interactions with other axonemal proteins.

      Many proteins participate in properly organizing flagella, but the exact understanding of the coordination is still far from conclusive. The present study gives the starting point to untangle the direct relationships and order of manifestation of those players underpinning spermatogenesis. Furthermore, comparing flagella and trachea provides a unique perspective that attracts evolutional perspectives.

      Thank you for your thoughtful and positive feedback. We are delighted that you found our study to be a strong substantiation of DNAH12's essential role in spermiogenesis, particularly in axonemal assembly. We believe that this study represents a meaningful step toward unraveling the intricate coordination of axonemal proteins during spermatogenesis, and your comments further inspire us to continue exploring these complex mechanisms in future work. Thank you once again for your valuable insights and summary of this work.

      Weaknesses:

      Seemingly minor, but the discrepancies found in patients and genetically modified animals were not fully explained. For example, both knockout mice vastly reduced the count of sperm in the epididymis and the motility, while phenotypes in patients were rather milder. Addressing the differences in the roles that the orthologs play in spermatogenesis would deepen the comprehensive understanding of axonemal assembly.

      This is an interesting question. Actually, it seems that although humans and mice share the male infertility phenotypes with deficiency in dynein proteins essential for sperm flagellar development, they are different in some ways. For instance, it has been reported that deficiency in DNAH17 (Clin Genet. 2021. PMID: 33070343) or DNAH8 (Am J Hum Genet. 2020. PMID: 32619401; PMCID: PMC7413861), two other members of Dynein Axonemal Heavy Chain family, also cause more severe phenotype in mice, comparing with that of human patients carrying bi-allelic DNAH17 or DNAH8 loss-of-function mutations. In knockout mice, sperm counts are lower, and the proportion of abnormal sperm morphology is higher, whereas the phenotypes in human patients tend to be milder. These observations suggest that orthologs may influence spermatogenesis to slightly different extents in humans and mice. We plan to investigate the mechanisms underlying these discrepancies in future studies, which will provide deeper insights into axonemal assembly and the evolutionary aspects of spermatogenesis. Thank you again for bringing up this important issue.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This reviewer is impressed by the study's depth and the extent of the methodology used in the study. The study is well-designed, and the results are very interesting. The reviewer's enthusiasm was reduced by the lack of some controls (provided that the reviewer did not miss them). Further are point-to-point suggestions that this reviewer believes will increase the merit of the present study.

      Title:

      (1) Why a "special" dynein? What makes it special when compared to other dyneins? I suggest removing the word special.

      Through phylogenetic and protein domain analyses of the DNAH family, we found that DNAH12 is the shortest member and the only one that lacks a typical microtubule-binding domain (MTBD) in the DNAH family, thus we want to describe it as a “special” dynein. We have fully considered your valuable suggestion and decided to remove it from the title.

      Abstract:

      (2) L23: same as above, why special?

      We identified DNAH12 as the shortest member of the DNAH family and uniquely lacking the typical microtubule-binding domain (MTBD). This distinct characteristic prompted us to describe it as a 'special' dynein in the abstract part.

      (3) L37: the reviewer did not find a figure (neither main nor supplementary) that would demonstrate the proper organization of microtubules in cilia. Figure S11 only shows the presence of cilia in DNAH12-/- mouse. A TEM image of cilia is required to confirm or reject the claim that DNAH12 does not play a crucial role in proper microtubule organization in cilia.

      We have now added TEM images of cilia in wild-type and Dnah12<sup>-/-</sup> mice. The ultra-structures of cilia axonemes were comparable in wild-type and Dnah12<sup>-/-</sup> groups, suggesting that DNAH12 may not play crucial role in proper microtubule organization. The results have now been added to Supplemental Figure 11F.

      (4) L122-6: Did the authors also confirm these structures by cryo-EM? If not, this needs to be pointed out as a shortcoming in the discussion, that the structures and interactions are predicted in silico only.

      Thank you for your comment. Due to resource limit, we do not perform cryo-EM to confirm these structures. We will pursue the structures details at an atomic resolution structure in further study. We understand this point and now we have addressed this as a shortcoming in the discussion part.

      (5) L134: Be more specific about what characteristics of DNAH12 were analyzed.

      Thank you for your comment. We have now updated these in the method part. The characteristics of the DNAH12 including its region immunogenicity, hydrophilicity, surface leakage groups, and sequence homology were analyzed.

      (6) L137: Be more specific about how the antibodies validated were. Were the antibodies validated for both immunofluorescence and western blotting? I suggest doing peptide blocking of the antibody, for instance for ICC, preincubation of ab with immunizing peptide followed by primary ab incubation with studied cells/tissues.

      Thank you for your comments and suggestions. We validated the antibodies for both immunofluorescence and western blotting to ensure their effectiveness in our experiments. The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have attempted to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were disposed after the service. Luckily, we have got the target band of DNAH12 protein which showed strong signal in western blotting experiment and the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, the IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies. We sincerely admire your suggestion and will require for the peptide material if we develop new antibodies.

      (7) L142: This reviewer is unfamiliar with using TRIzol for sperm protein extraction. Is there a specific reason for not using PAGE loading buffer for human sperm protein extraction?

      Thanks for your suggestions. TRIzol reagent can be used for small amounts of samples (5×10<sup>6</sup> cells) as well as large amounts of samples (>10<sup>7</sup> cells). It is suitable for extraction of RNA and proteins at the same time. Our lab has adopted these methods in our previous work (Hum Reprod Open. 2023; PMID: 37325547; PMCID: PMC10266965.). This method is very useful to process valuable small amounts of samples for scientific work. The human sperm protein extraction was added with SDS-sample buffer [PAGE loading buffer] before SDS-PAGE separation. We have added this detail in the method part. We are sorry for making this misunderstanding.

      (8) L144: Were these the final concentrations of the SDS loading buffer? 1 × Laemmli buffer contains 62.5 mM TRIS, 2% (w/w) SDS, 10 % (w/v) glycerol, and 5% 2-mercaptoethanol. Please, amend accordingly.

      Thanks for your suggestions.  We apologized for incorrect labelling of concentrations (The previous one is 3× SDS loading buffer).  We have now amended the SDS loading buffer to 1 × Laemmli buffer as suggested.

      (9) L151: Table S2 contains other homemade antibodies than DNAH12. Please, include references to the studies where the generation and validation of these antibodies is described.

      Thank you for your suggestions. We have developed a DNAH1 antibody for use in Western blot assays, with its generation and validation detailed in Frontiers in Endocrinology (Lausanne), 2021 (PMID: 34867808; PMCID: PMC8635859). Additionally, we have produced a DNAH17 antibody for both immunofluorescence (IF) and Western blot, as described in Journal of Experimental Medicine, 2020 (PMID: 31658987; PMCID: PMC7041708). These references have now been included.

      (10) L167: Please, spell out ICR at its first appearance.

      Done as suggested, Thank you. The full name of ICR is Institute of Cancer Research.

      (11)L169: This reviewer is confused. It seems that the mouse encodes DNAH12 on exons 5 and 18 simultaneously. Each mouse model has only one exon targeted for a knockout. Would not this mean that the expression of DNAH12 in both models is not completely knocked down? Please, give more background in this paragraph for those less familiar with CRISPR/Cas9.

      Thank you for your insightful comment. We appreciate your attention to detail. To clarify, while the mouse model does indeed encode DNAH12 on exons 5 and 18 simultaneously, we specifically targeted the key exon 5 or exon 18 in each model to achieve different knockout strategies. This approach allows us to assess the functional implications of the remaining DNAH12 expression in both models. We have checked the DNAH12 expression in both models, and the result showed both models present with undetected DNAH12 proteins, indicating both models were completely knocked out of DNAH12 proteins. Additionally, we will revise the manuscript to include further details on the CRISPR/Cas9 methodology, ensuring accessibility for readers less familiar with this technique. Thank you again for your valuable feedback, which we believe will greatly enhance our manuscript.

      (12) L201: 50 % PBS? As in 0.5 x concentrated PBS? Please, rewrite for clarity.

      The term "50% PBS" refers to a 1:1 dilution of phosphate-buffered saline (PBS) with an appropriate diluent, resulting in a final concentration of 0.5x PBS. We will revise the text to explicitly clarify this, ensuring it is clear to all readers. Thank you for highlighting this point.

      (13) L224: Please, state what beads those were (magnetic/agarose, conjugated to protein A/G...) Include catalog # and manufacturer.

      Thank you for your suggestion. We have updated the manuscript to include this information. The beads used were Protein A/G Magnetic Beads (Catalog #B23202, Bimake, Texas, USA).

      (14) L227: What was the reason for adding a proteasomal inhibitor? What concentration was used? Please, add this information to the text.

      We adding MG132 in cell immunoprecipitation (IP) experiments is to inhibit proteasomal activity, thereby preventing the degradation of the target protein. This helps maintain the stability of the target protein during the experiment (Sci Adv. 2022. PMID: 35020426; PMCID: PMC8754306.), enhancing its detectability in subsequent analyses. MG132 (5 μM) was added. We have added this information in the revised the manuscript

      (15) L233: in vivo IP of mouse testis lysate? This does not make sense. I suggest removing "in vivo".

      Thank you for your careful review and comments on our manuscript. We have modified as suggested.

      (16) L317: Supplemental Figure 6 precedes Supplemental Figure 5 in the text, which is neither logical nor orderly.

      Thank you for your suggestion. Since the N-terminal DNAH12 antibody is already described in the Methods section (L317), we propose removing Supplemental Figure 6 from the content to improve the logical flow and maintain an orderly presentation.

      (17) L345 and elsewhere: how did the authors quantify the decrement of the signal? This needs to be measured objectively.

      Thank you for your valuable suggestion. We quantified the signal intensity using Fiji (Nat Methods. 2012. PMID: 22743772; PMCID: PMC3855844), which allows for precise analysis of pixel intensity. The results are presented in the figures to effectively illustrate the decrement in signal intensity. We appreciate your suggestion, and we have provided a description of the method in our methodology section.

      (18) L371: I recommend: ...and elongated spermatids; the abnormal...

      Done as suggested. Thank you.

      (19) L412-4: Cilia in both Dnah12<sup>mut/mut</sup> and Dnah12<sup>-/-</sup> are developed, but are they motile or immotile? This needs to be investigated. Is the DNAH12 in cilia truncated while still fulfilling its function?

      Thanks for your comment. We have checked the ciliary motility using an inverted microscope, and no significant difference of ciliary motility were observed between the knockout group and the control group. These results indicated that the ciliary motility was not affected by DNAH12 deficiency. The N-terminal DNAH12 antibody was developed to detect whether a truncated protein in mice tissues while we do not detect DNAH12 signals through immunofluorescence assay on trachea sections of the Dnah12<sup>-/-</sup> mice. These results indicate that DNAH12 may exert little influence on cilia, comparing to its important function in flagella.

      (20) L414-6: The results do not support this claim as the authors do not show that cilia are motile.

      Thanks for your comment. The supplemental videos 3-4 of trachea live of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice have been uploaded to support this conclusion.

      (21) L421-3: Did the authors perform a negative test, where they let the testis lysate interact with beads/IgG only and performed the MS to identify non-specific binding? This is a crucial specificity test for this approach.

      We have performed negative test. In IP assay, we have added the IgG group in the IP-mass spectrometry to exclude non-specific binding. And the experimental design was described in Figure 6B. The raw data were deposited in iProX partner repository (PXD051681), which we have required the manager soon to update the status to public, so it will be visible to readers. 

      (22) L462: same as #18 the authors need to show that cilia are also motile. The mere presence of cilia in DNAH12-/- as shown in Fig S11C&D is not sufficient to conclude that the mice do not manifest PCD symptoms.

      Thanks for your comment. We do not observe obvious differences between the cilia of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice.  The supplemental videos 3-4 of trachea live of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice have been uploaded to show the motility of the trachea.

      (23) L529: MTBD region instead of domain, as "domain" is already part of the abbreviation.

      Done as suggested

      (24) L875: Sperm is both the singular and plural form. Spermatozoon vs spermatozoa can be used where the distinction between singular and plural needs to be made.

      Thanks for your suggestion. We have checked and changed this usage.

      (25) Figure 3H: Is there a specific reason why P11 is not shown?

      Because limited smear slides of P11 were available, the P11 were not stained for DNAH17 antibody previously. We have now updated the experiment, which showed that DNAH17 expression were not affected in patient P11. We have now added this result to Figure 3H.

      (26) Figure 8H: The authors in their MS do not describe what is happening to N-DRC proteins, yet they suggest in their model that it's unaffected in the mutant mouse/human. Please, address this in the MS and clearly state in the model that N-DRC needs further exploration in future studies.

      Thanks for your suggestion, we have checked the MS data but do not observe the enrichment of nexin-dynein regulatory complex (N-DRC) protein, just one known N-DRC protein DRC1 present with only 1 unique peptide. Instead, enrichment of inner dynein arm proteins and radial spoke proteins were observed. However, we cannot determine the N-DRC structures maybe affected or not. We have stated this in the discussion part and will pursue this with high resolution technology like cryo-EM in the future.

      (27) Figure 5F: Is it possible to choose a different Dnah12<sup>-/-</sup> spermatozoon to see a reduced level of DNALI1 so that it corresponds with the WB detection in Fig 5B?

      Thanks for your suggestion, we have chosen a Dnah12<sup>-/-</sup> spermatozoon with faint remnants of the DNALI1 signal as the representative picture.

      (28) Figure S2 and elsewhere: How were the authors able to resolve and calibrate 356 kDa protein using SDS PAGE? Agarose electrophoresis protein electrophoresis is more suitable for resolution of high molecular proteins. Most of the protein standards have as high molecular standard as 250 kDa.

      We have found that high molecular proteins (like 356kDa) were able to resolve in concentration 4-12% gradient gel of polyacrylamide gels and employ appropriate voltages and more time during electrophoresis to improve resolution of high molecular weight proteins. The DNAH12 proteins were calibrated by the using of a HiMark™ Pre-Stained High Molecular Weight Protein Standard (30-460 kDa). We have now updated the blot images to show the size of the DNAH12 protein (Fig S6B,). The target band is obvious between 268 kDa and 460 kDa, which make it easy to calculate the target band of DNAH12 antibody elsewhere. Thanks for your suggestion.

      (29) Figure S5: similar to #24: Why P10 and P11 are not shown?

      Because limited smear slides of P10 or P11 were available, we did not stain ODF2 antibody previously. We have now updated the experiments, which showed that ODF2 expression were not affected in patient P10 or P11. We have now added this result to Figure S5.

      (30) Figure S6B: The specificity of the anti-DNAH12 antibody against mouse DNAH12 seems to be questionable since the authors detect multiple bands on WB. I recommend doing peptide blocking to show that these are non-specific binding as opposed to off-target binding.

      Thank you for your comments. In order to study in-depth, we have analyzed the protein sequence features of DNAH12 protein, 1-200 amino acids of DNAH12 were selected as the ideal antigen considering its good performance (1. high immunogenicity; 2. High hydrophilicity; 3. Good Surface Leakage Groups; 4. Sequence homology analysis to avoid unspecific recognition to other proteins;). The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have attempted to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were disposed after the service. Luckily, we have got the target band of DNAH12 protein which showed strong signal in western blotting experiment and the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, we have tested that the inhouse raised rabbit antibody was suitable for IP experiment. The IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies. We admire your suggestion and will require for the peptide material if we develop new antibodies.

      Reviewer #2 (Recommendations For The Authors):

      Recruitment of DNAH1 and DNALI1 to the flagella is dependent on DNAH12 expression, according to the data. What would be the mechanism that locates DNAH12 which lacks MTBD to the flagella?

      Thank you for your insightful question. We are currently investigating the mechanisms that facilitate the loading of DNAH12 to the flagella. Based on existing data, we hypothesize that CCDC39 and/or CCDC40 may play a critical role in the recruitment of DNAH12 to sperm flagella during spermiogenesis (Nat Genet. 2011, PMID: 21131972; PMCID: PMC3509786; Nat Genet. 2011, PMID: 21131974; PMCID: PMC3132183). Furthermore, a structural study by Walton et al. showed that DNAH12 associates with CCDC39/CCDC40 proteins (Nature. 2023, PMID: 37258679; PMCID: PMC10266980). These findings suggest that CCDC39 and/or CCDC40 may play a role in facilitating the localization of DNAH12 to the flagella. Additional studies are needed to identify other potential factors involved in this process and to further elucidate the mechanisms underlying this complex biological phenomenon.

    1. eLife Assessment

      This important study reports the transcriptomic and proteomic landscapes of the oviducts at four different preimplantation stages during natural fertilization, pseudopregnancy, and superovulation. The supporting data are convincing. This work will be of interest to reproductive biologists and clinicians practicing reproductive medicine.

    2. Reviewer #1 (Public review):

      Summary:

      The paper demonstrated through a comprehensive multi-omics study of the oviduct that the transcriptomic and proteomic landscape of the oviduct at 4 different preimplantation periods was dynamic during natural fertilization, pseudopregnancy, and superovulation using three independent cell/tissue isolation and analytical techniques. This work is very important for understanding oviductal biology and physiology. In addition, the authors have made all the results available in a web search format, which will maximize the public's access and foster and accelerate research in the field.

      Strengths:

      (1) The manuscript addresses an important and interesting question in the field of reproduction: how does the oviduct at different regions adapt to the sperm and embryos for facilitating fertilization and preimplantation embryo development and transport?<br /> (2) Authors used cutting-edge techniques: Integrated multi-modal datasets followed with in vivo confirmation and machine learning prediction.<br /> (3) RNA-seq, scRNA-seq and proteomic results are immediately available to the scientific community in a web search format<br /> (4) Substantiated results indicate the source of inflammatory responses was the secretory cell population in the IU region when compared to other cell types; sperm modulate inflammatory responses in the oviduct; the oviduct displays immuno-dynamism.

      In addition, the revised version has addressed weaknesses adequately.<br /> (1) The revised version provided a clear explanation and the rationale for using the superovulation model.<br /> (2) The revised version generated a graphic abstract/summary of their major findings.

    3. Reviewer #2 (Public review):

      The manuscript investigates oviductal responses to the presence of gametes and embryos using a multi-omics and machine learning-based approach. By applying RNA sequencing (RNA-seq), single-cell RNA sequencing (sc-RNA-seq), and proteomics, the authors identified distinct molecular signatures in different regions of the oviduct, proximal versus distal. The study revealed that sperm presence triggers an inflammatory response in the proximal oviduct, while embryo presence activates metabolic genes that provide nutrients to the developing embryos. Overall, this study offers valuable insights and will likely be of great interest to reproductive biologists and researchers in oviduct biology.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study reports the transcriptomic and proteomic landscape of the oviducts at four different preimplantation periods during natural fertilization, pseudopregnancy, and superovulation. The data presented convincingly supported the conclusion in general, although more analyses would strengthen the conclusions drawn. This work will interest reproductive biologists and clinicians practicing reproductive medicine. 

      We appreciate the concise summary and agree that additional experiments can reinforce the fidelity of predictions made by our robust bioinformatic characterization of the oviduct. Our robust bioinformatic model appears reproducible as similar pathway trends have been produced in all three datasets, lending confidence for future researchers to establish testable hypotheses more effectively.  

      Reviewer #1 (Public review):

      The paper demonstrated through a comprehensive multi-omics study of the oviduct that the transcriptomic and proteomic landscape of the oviduct at 4 different preimplantation periods was dynamic during natural fertilization, pseudopregnancy, and superovulation using three independent cell/tissue isolation and analytical techniques. This work is very important for understanding oviductal biology and physiology. In addition, the authors have made all the results available in a web search format, which will maximize the public's access and foster and accelerate research in the field.

      Strengths:

      (1) The manuscript addresses an important and interesting question in the field of reproduction: how does the oviduct at different regions adapt to the sperm and embryos for facilitating fertilization and preimplantation embryo development and transport?

      (2) Authors used cutting-edge techniques: Integrated multi-modal datasets followed by in vivo confirmation and machine learning prediction.

      (3) RNA-seq, scRNA-seq, and proteomic results are immediately available to the scientific community in a web search format.

      (4) Substantiated results indicate the source of inflammatory responses was the secretory cell population in the IU region when compared to other cell types; sperm modulate inflammatory responses in the oviduct; the oviduct displays immuno-dynamism.

      We sincerely thank you for your thorough and insightful review of our manuscript. Your comprehensive summary accurately captures the essence of our multi-omics study on oviductal biology, highlighting its importance in understanding reproductive physiology. We are particularly grateful for your recognition of our study's strengths. In the revised manuscript, we have added another searchable scRNA-seq data on our public website; https://genesearch.org/winuthayanon/Oviduct_pregnancy/. We have also addressed the weaknesses in the response below in our revised manuscript.  

      Weaknesses:

      (1) The rationale for using the superovulation model is not clear. The oviductal response to sperm and embryos can be studied by comparing mating with normal and vasectomized mice and comparing pregnancy vs pseudopregnancy (induced by mating with vasectomized males). Superovulation causes supraphysiological hormone levels and other confounding conditions.

      We agree with this assessment that superovulation changes the hormonal levels and could have a confounding impact on the oviduct function. As such, for all experiments involving pseudopregnant datasets, pseudopregnancy was induced by mating females with vasectomized males without superovulation. Our oviductal luminal protein content analysis was collected from oviductal fluid from pregnant females with and without superovulation. This allowed us to directly compare the impact of superovulation on protein abundance and profile. In the revised manuscript, we have provided clarifying statements on using superovulation in our Method section, which reads 

      “Datasets from the natural cycle and SO allowed us to directly compare the impact of exogenous hormone treatments on protein abundance and profile distinct from the physiological levels of hormones”.

      One exception for using superovulation in the absence of a “natural mating” group for comparison is the scRNA-seq dataset. As single-cell libraries should be performed in a single run to avoid batch effects, we need to ensure that a sufficient number of females were pregnant for single-cell isolation (we used ~4 mice/time point). Therefore, superovulation was used to synchronize and ensure that the females were receptive to mating. At the time of our sample collection, single nuclei isolation methods (freeze tissue now, isolate nuclei later) had not been reliable or standardized. We tried to synchronize females using the male bedding without superovulation. However, we would still need to set up at least 12-15 females per pregnancy timepoint to mate with male mice, totaling ~48-60 mice each night. Due to budget constraints and vivarium space limitations, we were not able to do so. We have included a similar statement to clarify the justifications in the revised Methods, which reads,

      “Mating and tissue collection protocols were similar to bulk RNA isolation described above, with the exception that female mice were superovulated using the protocol described previously (73) to ensure sufficient numbers of female mice at each timepoint could be harvested for single cell isolation and library preparation within the same day (n= 3-4 mice/group)”.

      (2) This study involves a very complex dataset with three different models at four time points. If possible, it would be very informative to generate a graphic abstract/summary of their major findings in oviductal responses in different models and time points

      Thank you for this suggestion. We have now included the graphical abstract to accompany our final version of the manuscript.

      (3) The resolution of Figures 3A-3C in the submitted file was not high enough to assess the authors' conclusion.

      We have now used a higher magnification of images in Figures 3A-C in the revised version.

      (4) The authors need to double-check influential transcription factors identified by machine learning. Apparently, some of them (such as Anxa2, Ift88, Ccdc40) are not transcription factors at all.

      We appreciate the recognition of this oversight. In the revised manuscript, we have clarified and stated the distinction between ‘influential transcripts’ and ‘influenced proteins’, which now reads,

      “The top 25 “influential” transcripts (ITs) with the highest attention scores from all the transcription factors present in bulk RNA-seq data were extracted for every potentially influenced protein (IP) in the empirical proteomics datasets”. 

      Recommendations for the authors:

      (1) What are the stained debris/nuclei surrounding oocytes/fertilized eggs in Figure 1A? Please indicate in figure legends. 

      We have edited Figure 1A with black arrows that highlight the stained cumulus cells surrounding the ovulated eggs/fertilized eggs, together with a revised Figure legend, which now reads, “Arrows indicate cumulus cells surrounding the eggs/fertilized eggs called cumulus-oocyte complexes”.

      (2) "Then, oviducts were sectioned into IA and IU regions" The Ampulla region is quite a long tube. Could authors provide details about the cutting border between IA and IU regions? 

      We have now included a literature defining the number of turns in the coiled mouse oviduct and how we cut between the IA and IU regions in the Method section, which reads,

      “We defined the IA region by including the infundibulum and cutting at turn three of the oviductal coil (the end of ampulla) (5). Turn four to eleven was considered the IU region, which was stripped of uterine tissue enveloping the colliculus tubaris of the UTJ region (5)”. 

      (3) "In this experiment, superovulation (SO) using exogenous gonadotropins was used due to technical limitations of sample collection for single-cell processing." It was not clear. What was the technical limitation of sample collections? 

      As indicated in response to the public review above, we have now clarified that we used superovulation for scRNA-seq analysis to ensure that a sufficient number of females were pregnant for singlecell isolation (we used ~4 mice/time point). Therefore, superovulation was used to synchronize, making sure that females were receptive to mating, thereby providing enough cell numbers for the experiment.

      (4) Ephx2+ cluster (only present at SO 0.5 dpc and SO estrus) was very interesting. Could the author provide more information about this gene and the potential cell type this cluster represents? 

      We appreciate the reviewer’s interest in this cell-type cluster. We have now included the discussion regarding this gene, which reads, “Interestingly, the Ephx2+ cluster was mainly present in the SO 0.5 dpc and SO estrus samples. Ephx2 encodes epoxide hydrolase 2, which converts epoxides to dihydrodiols. Recent findings suggest that EPHX2 may play a role in primary hypertension in humans (52). However, the reproductive-related functions of EPHX2 have not yet been investigated. Therefore, we believe this presents an opportunity for future research to define the role of Ephx2 in the oviduct in response to SO during preimplantation embryo development.” However, as it is beyond the scope of the research provided in this manuscript, we did not further investigate the roles of Ephx2 in our current study. 

      (5) "we elucidated whether exogenous hormone treatment impacts protein secretion in the oviduct. There were 298, 354, and 163 differentially abundant proteins when compared between SO estrus vs. SO 269 0.5 dpc". Which hormone?? FSH/LH? Or high estrogens due to more mature follicles; or more embryos instead of hormones? Again, the rationale for using the superovulation model need to be better explained with the consideration of other possibilities. 

      Thank you for pointing this out. We have clarified that “exogenous hormone treatment” was the superovulation (SO), which is now corrected in the statement, which reads, “we elucidated whether SO treatment impacts protein secretion in the oviduct”. 

      The justification for the superovulation has now been included in the revised manuscript as indicated in the responses to reviewers above. A detailed description of gonadotropin treatment was included in the Material and Methods section. As the reviewer suggested, we have revised in the Discussion, including the caveat and possibility of the other factors that could lead to biological changes we observed subsequent to SO, which reads, 

      “As SO increases the number of mature follicles (therefore, estrogen levels), ovulated eggs, and follicular fluid, it is also likely that these biological alterations could lead to changes in the protein abundance in the oviduct”.

      (6) "we used RNAScope in situ hybridization staining of Tlr2, Ly6g (leukocytes), and Ptprc (common immune cell marker)." Please indicate what cell types Tlr2 marker was for. 

      We have now corrected the statement to include the cell types with Tlr2+ staining, which reads, “we used RNAScope in situ hybridization staining of Tlr2 (epithelium, stroma, and myosalpinx), ”.

      (7) In which cell types are P38 and p-P38 expressed?  

      Based on our scRNA-seq searchable dataset, which has been included in the revised manuscript (https://genesearch.org/winuthayanon/Oviduct_pregnancy/), we found that Mapk14 (encoding P38) was highly expressed in the immune cells in mice (red arrows in the UMAPs below).

      Author response image 1.

      In humans, scRNA-seq data published by Ulrich et al. (PMID: 35320732) showed that MAPK14 was present in most cell types in the Fallopian tubes at low levels (see violin plot below).

      Author response image 2.

      (8) "Our findings showed an influx of Ptrprc+ cells to the stromal layer, and subsequently penetration into the epithelial layer in the presence of sperm at 0.5 dpc in the UTJ." The authors didn't have results for tracking the influx Ptrprc+ cell to the stromal layer. 

      Thank you for pointing this out. We agreed with the reviewer’s assessment, as we did not have the results of the tracking of the influx of Ptprc+ cells. We have corrected and removed the “influx” statement, which now reads, “Our findings showed that Ptrprc+ cells were present in the stromal and epithelial layers in the presence of sperm at 0.5 dpc in the UTJ.”

      Reviewer #2 (Public review):

      The manuscript investigates oviductal responses to the presence of gametes and embryos using a multi-omics and machine learning-based approach. By applying RNA sequencing (RNA-seq), single-cell RNA sequencing (sc-RNA-seq), and proteomics, the authors identified distinct molecular signatures in different regions of the oviduct, proximal versus distal. The study revealed that sperm presence triggers an inflammatory response in the proximal oviduct, while embryo presence activates metabolic genes essential for providing nutrients to the developing embryos. Overall, this study offers valuable insights and is likely to be of great interest to reproductive biologists and researchers in the field of oviduct biology. However, further investigation into the impact of sperm on the immune cell population in the oviduct is necessary to strengthen the overall findings.

      We appreciate the concise summary, strengths, and weaknesses highlighted. We have addressed all comments made by the reviewer concerning superovulation, figure recommendations, and additional analysis in our revised manuscript. We have included a new analysis of scRNA-seq datasets from human Fallopian tube tissues collected from hydrosalpinx patients and healthy subjects by Ulrich et al. (PMID: 35320732). The evaluation of this human data helped distinguish between different inflammatory pathways stimulated by sperm vs. general inflammation, as well as species differences (more details in responses below). In future studies, we will follow up on a detailed description of immune cell types present at 0.5 dpc using FACS analysis. This is mainly due to a lack of expertise and technical limitations in our lab on immune cell investigation. Nevertheless, we have already recruited two immunologists to facilitate our future immune cell studies. We have also provided a clear justification for superovulation, especially in the scRNA-seq analysis in the revised manuscript (please see response to Reviewer 1 above). 

      Recommendations for the authors:

      (1) In Figure 3A and 3B, the authors should provide higher contrast and high-resolution images for the expression of the selected immune cell markers at 0.5 dpc and 0.5 dpp. For better clarity and flow, 0.5 dpc & 0.5 dpp, as well as 1.5 dpc & 1.5 dpp, should be merged into a single panel.  

      Thank you for this suggestion. As shown in the response to Reviewer 1 above, we have now used a higher-magnification image for Fig. 3. We have also changed the panel in the quantification graphs to better reflect the immunofluorescent images and improve clarity and flow.

      (2) The authors demonstrated that sperm induces an inflammatory response in the oviduct by presenting IF for selected immune cells. However, FACS analysis should be included to dissect the various immune cell populations further. 

      We appreciate the recommendation and agree that FACS analysis should provide a more detailed description of the immune cell types present at 0.5 dpc. However, our current work primarily offers initial investigations, confirming that three bioinformatic models (bulk RNA-seq, scRNA-seq, and proteomic analyses) can be validated by IF staining. Our future research using analysis should provide additional characterization of immune cell types at 0.5 dpc in the oviduct.

      (3) In Figure 2, the authors performed proteomic analysis at different stages of implantation. They observed similar alterations in the pro-inflammatory Reactome, as seen with RNA-seq and sc-RNA-seq analyses. It would be interesting to examine the types of proteins induced by embryo presence and how their expression changes at 1.5 and 2.5 dpc. Similarly, are sperm-interacting proteins induced in response to sperm presence at 0.5 dpc? Are these proteins uniquely present in the isthmus compared to the ampulla? 

      We sincerely appreciate the reviewer’s insightful comments regarding the findings in Figure 2 and the potential avenues for further exploration of the proteomic analysis during different stages of embryo preimplantation. We found that during 1.5 dpc, enriched Reactome included Innate Immune System and RHO GTPase (Fig. S4A). In comparison, Reactome at 2.5 dpc were enriched for Keratinization, Metabolism of Protein, and Post-translational Protein Modification (Fig. S4B). Therefore, the pro-inflammatory Reactome profile appeared to have completely subsided at 2.5 dpc. This statement has now been included in the results section, which reads, “Lastly, differential protein abundance at 1.5 dpc and 2.5 dpc indicated the enrichment for Ras Homolog (RHO) GTPase signaling pathway and changes in epithelial remodeling (keratinization) (Fig. S4A and B), respectively. Therefore, the pro-inflammatory Reactome profile appeared to have completely subsided at 2.5 dpc”.

      And yes, we detected sperm-interacting proteins (such as OVGP1, ANXA1, HSPA5, and PDIA6, etc.) from our 0.5 dpc proteomic datasets (see examples from images below taken from our dataset:

      https://genes.winuthayanon.com/winuthayanon/oviduct_proteins/). We noticed that all of these sperminteracting protein levels were lower at 0.5 dpc compared to other timepoints. We speculated that these proteins bind to the sperm and were washed out together with the sperm during the pre-processing centrifugation prior to mass spectrometry analysis. However, we could not distinguish the original location (ampulla vs. isthmus) of proteins as the luminal fluid was flushed from the entire oviduct.

      Author response image 3.

      (4) Given that salpingitis is associated with inflammation of the fallopian tubes, the authors should consider comparing the gene signatures from this study with publicly available salpingitis datasets. 

      Thank you for this insightful suggestion. We have reanalyzed the human data from scRNA-seq of Fallopian tube tissues collected from hydrosalpinx (inflamed and dilated tube) and healthy patients by Ulrich et al. (PMID: 35320732). From this published human dataset, we have evaluated GO biological pathways enriched in the differentially expressed genes (DEGs) in hydrosalpinx compared to healthy Fallopian tubes. We have added these new data in the revised Results, Fig. 5 and Supplementary Dataset S5. The new data now read,  

      “Evaluation of human hydrosalpinx Fallopian tubes compared to sperm-induced inflammation genes

      To determine whether sperm-induced inflammatory responses in the mouse oviduct are similar to or different from human inflammation conditions, we reanalyzed publicly available scRNA-seq data from hydrosalpinx samples by Ulrich et al (50). We found that some of the sperm-induced inflammatory genes identified from our mouse study were present and upregulated in hydrosalpinx samples compared to healthy subjects (Fig. 5A). However, the differentially expressed levels, for example the CCL2 gene, appeared to be marginal between healthy vs. hydrosalpinx samples (Fig. 5_B-C_ and Supplemental Datasets S5). Nevertheless, the top five most enriched GOBPs related to inflammatory responses were Regulation of Complement Activation, Positive Regulation of Macrophage Migration Inhibitory Factor Signaling Pathway, MHC Class II Protein Complex Assembly, Positive Regulation of NK Cell Chemotaxis, and Negative Regulation of Metallopeptidase Activity (Fig. 5D). These GOBPs differed from those identified in sperm-exposed mouse oviducts at 0.5 dpc, which were enriched for neutrophil-related pathways, unlike macrophages or NK cells in hydrosalpinx samples”.

      We have also added a revised Discussion, which now reads, 

      “Lastly, we found that sperm-induced inflammatory conditions in the oviduct were potentially different than those of chronic inflammatory conditions in human Fallopian tubes. The inflammatory responses observed in mice and humans exhibited significant distinction based on immune cell involvement, mechanisms, and context. In mice, acute inflammation after sperm exposure could be primarily characterized by the activation of neutrophils, which serve as the first responders to injury or foreign bodies. In contrast, human Fallopian tubes with hydrosalpinx conditions displayed chronic inflammatory conditions predominantly involving macrophages and NK cells, suggesting a more complex and sustained immune response. It is also possible that inflammation in the oviduct differs between mice and humans. Understanding these species-specific variations is crucial for developing effective therapeutic strategies, as findings from murine models may not accurately translate to human inflammatory conditions due to the distinct immune dynamics at play”.

      (5) In Line 259, the authors should clarify why SO females were chosen for luminal fluid collection at different points. 

      Thank you for pointing this out. We wanted to clarify that the proteomic analysis from the luminal fluid was performed in both naturally mated with and without SO. We have revised the statement in the Results section, which now reads,

      “To validate our transcriptomics data at a translational level, LC-MS/MS proteomic analysis was performed on secreted proteins in the oviductal luminal fluid at estrus, 0.5, 1.5, and 2.5 dpc with or without SO. As we also aim to address whether changes in proteomic profiles in the oviduct are governed by hormonal fluctuations, the SO was performed using exogenous gonadotropins. Therefore, the comparison was assessed in the following groups: estrus, 0.5 dpc, 1.5 dpc, 2.5 dpc, SO estrus, SO 0.5 dpc, SO 1.5 dpc, and SO 2.5 dpc”.

      In addition, we have now provided additional clarification in the Method section, which reads,

      “In this context, our SO approach facilitates multi-dimensional analysis comparisons among naturally cycling bulk RNA-seq, SO scRNA-seq, and natural luminal proteomic biological replicates, enhancing confidence between different methods. This experimental design also reflects adaptive responses in the oviduct during natural fertilization and preimplantation development, influenced by PMSG and hCG treatments at both RNA and protein levels. Furthermore, SO is commonly used in female reproduction to synchronize estrus cycles in animals, thus reducing variables at each collection timepoint.”.

      (6) The authors should include scale bars in all fluorescent images. 

      We apologize for this oversight. In all applicable figures, we have provided a scale bar for all immunofluorescent images.

    1. eLife Assessment

      The manuscript presents valuable data suggesting that enalapril elevates pSmad1/5/9 signaling, reduces cellular senescence, and enhances physiological functions in aged mice. However, the in vivo evidence remains incomplete, as studies blocking pSmad1/5/9 or using NAC to negate enalapril's purported benefits are lacking, and no lifespan extension data are shown. While in vitro findings support Smad1/5/9 as a key mediator, additional experiments - including BMP receptor inhibition and comprehensive senescence markers - are necessary to validate its essential role in vivo. Overall, the study provides promising insights into enalapril's anti-senescence potential but requires further rigorous investigation to fully substantiate its mechanism and therapeutic impact.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors showed that enalapril was able to reduce cellular senescence and improve health status in aged mice. The authors further showed that phosphorylated Smad1/5/9 was significantly elevated and blocking this pathway attenuated the protection of cells from senescence. When middle-aged mice were treated with enalapril, the physiological performance in several tissues, including memory capacity, renal function, and muscle strength, exhibited significant improvement.

      Strengths:

      The strength of the study lies in the identification of the pSMAD1/5/9 pathway as the underlying mechanism mediating the anti-senescence effects of enalapril with comprehensive evaluation both in vitro and in vivo.

      Weaknesses:

      The major weakness of the study is the in vivo data. Despite the evidence shown in the in vitro study, there is no data to show that blocking the pSmad1/5/9 pathway is able to attenuate the anti-aging effects of enalapril in the mice. In addition, the aging phenotypes mitigation by enalapril is not evidenced by the extension of lifespan. If it is necessary to show that NAC is able to attenuate enalapril effects in the aging mice. In addition, it would be beneficial to test if enalapril is able to achieve similar rescue in a premature aging mouse model.

    3. Reviewer #2 (Public review):

      This manuscript presents an interesting study of enalapril for its potential impact on senescence through the activation of Smad1/5/9 signaling with a focus on antioxidative gene expression. Repurposing enalapril in this context provides a fresh perspective on its effects beyond blood pressure regulation. The authors make a strong case for the importance of Smad1/5/9 in this process, and the inclusion of both in vitro and in vivo models adds value to the findings. Below, I have a few comments and suggestions which may help improve the manuscript.

      A major finding in the study is that phosphorylated Smad1/5/9 mediates the effects of enalapril. However, the manuscript focused on the Smad pathway relatively abruptly, and the rationale behind targeting this specific pathway is not fully explained. What makes Smad1/5/9 particularly relevant to the context of this study?

      Furthermore, their finding that activation of Smad1/5/9 leads to a reduction of senescence appears somewhat contradictory to the established literature on Smad1/5/9 in senescence. For instance, studies have shown that BMP4-induced senescence involves the activation of Smad1/5/8 (Smad1/5/9), leading to the upregulation of senescence markers like p16 and p21 (JBC, 2009, 284, 12153). Similarly, phosphorylated Smad1/5/8 has been shown to promote and maintain senescence in Ras-activated cells (PLOS Genetics, 2011, 7, e1002359). Could the authors provide more detailed mechanistic insights into why enalapril seems to reverse the typical pro-senescent role of Smad1/5/9 in their study?

      While the authors showed that enalapril increases pSmad1/5/9 phosphorylation, what are the expression levels of other key and related factors like Smad4, pSmad2, pSmad3, BMP2, and BMP4 in both senescent and non-senescent cells? These data will help clarify the broader signaling effects.

      They used BMP receptor inhibitor LDN193189 to pharmacologically inhibit BMP signaling, but it would be more convincing to also include genetic validation (e.g., knockdown or knockout of BMP2 or BMP4). This will help confirm that the observed effects are truly due to BMP-Smad signaling and not off-target effects of the pharmacological inhibitor LDN.

      I don't see the results on the changes in senescence markers p16 and p21 in the mouse models treated with enalapril. Similarly, the effects of enalapril treatment on some key SASP factors, such as TNF-α, MCP-1, IL-1β, and IL-1α, are missing, particularly in serum and tissues. These are important data to evaluate the effect of enalapril on senescence.

      Given that enalapril is primarily known as an antihypertensive, it would be helpful to include data on how it affects blood pressure in the aged mouse models, such as systolic and diastolic blood pressure. This will clarify whether the observed effects are independent of or influenced by changes in blood pressure.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors showed that enalapril was able to reduce cellular senescence and improve health status in aged mice. The authors further showed that phosphorylated Smad1/5/9 was significantly elevated and blocking this pathway attenuated the protection of cells from senescence. When middle-aged mice were treated with enalapril, the physiological performance in several tissues, including memory capacity, renal function, and muscle strength, exhibited significant improvement.

      Strengths:

      The strength of the study lies in the identification of the pSMAD1/5/9 pathway as the underlying mechanism mediating the anti-senescence effects of enalapril with comprehensive evaluation both in vitro and in vivo.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      Weaknesses:

      The major weakness of the study is the in vivo data. Despite the evidence shown in the in vitro study, there is no data to show that blocking the pSmad1/5/9 pathway is able to attenuate the anti-aging effects of enalapril in the mice. In addition, the aging phenotypes mitigation by enalapril is not evidenced by the extension of lifespan.

      Thanks for your comment. As suggested, we will feed LDN193189 to mice while using LDN193189 to block pSmad1/5/9, and will assess age-related phenotypes in the mice to demonstrate that the anti-aging effect of enalapril in mice is mediated through pSmad1/5/9.

      We only assess the improvement in the health status of the aging mice, which indicate that enalapril can extend the healthy lifespan of aging mice. This is because we believe that lifespan is controlled by genetics. Therefore, this study focuses solely on the improvement of health phenotypes in aging mice by enalapril.

      If it is necessary to show that NAC is able to attenuate enalapril effects in the aging mice. In addition, it would be beneficial to test if enalapril is able to achieve similar rescue in a premature aging mouse model.

      Thanks for your suggestion. To our knowledge, NAC is an inhibitor of ROS, which is consistent with the antioxidant effect of enalapril. Therefore, we believe that NAC will not diminish the effect of enalapril.

      For the premature aging mouse models, we examined the effect of enalapril on Lmna<sup>G609G</sup> mice and other premature aging models and found that the effect was relatively modest. This may be due to differences in the genetic background of premature aging mice, leading to a less pronounced effect of enalapril compared to its impact on naturally aged mice.

      Reviewer #2 (Public review):

      This manuscript presents an interesting study of enalapril for its potential impact on senescence through the activation of Smad1/5/9 signaling with a focus on antioxidative gene expression. Repurposing enalapril in this context provides a fresh perspective on its effects beyond blood pressure regulation. The authors make a strong case for the importance of Smad1/5/9 in this process, and the inclusion of both in vitro and in vivo models adds value to the findings. Below, I have a few comments and suggestions which may help improve the manuscript.

      Thanks very much for your insightful evaluation and the constructive suggestions. We have thoroughly studied the comments and a provisional point-to-point response is shown as follows.

      A major finding in the study is that phosphorylated Smad1/5/9 mediates the effects of enalapril. However, the manuscript focused on the Smad pathway relatively abruptly, and the rationale behind targeting this specific pathway is not fully explained. What makes Smad1/5/9 particularly relevant to the context of this study?

      Thanks for your comment. As stated in the manuscript, after we found that enalapril could improve the cellular senescence phenotype, we screened and examined key targets in important aging-related signaling pathways, such as AKT, mTOR, ERK (Fig. S2A), Smad2/3 and Smad1/5/9 (Fig. 2A). We found that only the phosphorylation levels of Smad1/5/9 significantly increased after enalapril treatment. Therefore, the subsequent focus of this study is on pSmad1/5/9.

      Furthermore, their finding that activation of Smad1/5/9 leads to a reduction of senescence appears somewhat contradictory to the established literature on Smad1/5/9 in senescence. For instance, studies have shown that BMP4-induced senescence involves the activation of Smad1/5/8 (Smad1/5/9), leading to the upregulation of senescence markers like p16 and p21 (JBC, 2009, 284, 12153). Similarly, phosphorylated Smad1/5/8 has been shown to promote and maintain senescence in Ras-activated cells (PLOS Genetics, 2011, 7, e1002359). Could the authors provide more detailed mechanistic insights into why enalapril seems to reverse the typical pro-senescent role of Smad1/5/9 in their study?

      Thanks for your comment. The downstream regulatory network of BMP-pSmad1/5/9 is highly complex. The BMP-SMAD-ID axis has been mentioned in many studies, and its downstream signaling inhibits the expression of p16 and p21 (PNAS, 2016, 113(46), 13057-13062; Cell, 2003, 115(3), 281-292). Additionally, studies have also found that the Smad1-Stat1-P21 axis inhibits osteoblast senescence (Cell Death Discovery, 2022, 8:254). In our study, enalapril was found to increase the expression of ID1, which is a classic downstream target of pSmad1/5/9 (Cell Stem Cell, 2014, 15(5), 619-633). Therefore, pSmad1/5/9 inhibits cellular senescence markers such as p16, p21 and SASP through ID1, thereby promoting cell proliferation (Fig. 3). Furthermore, we also found that pSmad1/5/9 increases the expression of antioxidant genes and reduces ROS levels, exerting antioxidant effects (Fig. 4). Together, ID1 and antioxidant genes enable pSmad1/5/9 to exert its anti-aging effects.

      While the authors showed that enalapril increases pSmad1/5/9 phosphorylation, what are the expression levels of other key and related factors like Smad4, pSmad2, pSmad3, BMP2, and BMP4 in both senescent and non-senescent cells? These data will help clarify the broader signaling effects.

      Thanks for your suggestion. We observed an increase in Smad4 expression, while the levels of pSmad2 and pSmad3 remained unchanged after enalapril treatment (Fig. 2A). We will supplement data on the expression changes of these key factors in both senescent and non-senescent cells.

      They used BMP receptor inhibitor LDN193189 to pharmacologically inhibit BMP signaling, but it would be more convincing to also include genetic validation (e.g., knockdown or knockout of BMP2 or BMP4). This will help confirm that the observed effects are truly due to BMP-Smad signaling and not off-target effects of the pharmacological inhibitor LDN.

      Thanks for your suggestion. We will use shRNA or siRNA to knockdown BMP and examine the related changes to clarify the role of BMP-Smad signaling.

      I don't see the results on the changes in senescence markers p16 and p21 in the mouse models treated with enalapril. Similarly, the effects of enalapril treatment on some key SASP factors, such as TNF-α, MCP-1, IL-1β, and IL-1α, are missing, particularly in serum and tissues. These are important data to evaluate the effect of enalapril on senescence.

      Thanks for your comment. As for the markers p16 and p21, we observed no change in p16, while the changes in p21 varied across different organs and tissues. (Author response image 1). Nevertheless, behavioral experiments and physiological and biochemical indicators at the individual level consistently demonstrated the significant anti-aging effects of enalapril (Fig. 6).

      Author response image 1.

      p21(Cdkn1a) expression levels in organs of mice after enalapril feeding.

      We also examined the changes in SASP factors in the serum of mice after enalapril treatment. Notably, SASP factors such as CCL (MCP), CXCL and TNFRS11B showed significant decreases (Fig. 5C). The expression changes of SASP factors varied across different organs. In the liver, kidneys and spleen, the expression of IL1a and IL1b decreased, while TNFRS11B expression decreased in both the liver and muscles (Fig. 5B). Additionally, CCL (MCP) levels decreased in all organs (Fig. 5B).

      Given that enalapril is primarily known as an antihypertensive, it would be helpful to include data on how it affects blood pressure in the aged mouse models, such as systolic and diastolic blood pressure. This will clarify whether the observed effects are independent of or influenced by changes in blood pressure.

      Thanks for your comment. We measured the blood pressure in mice, and found no significant change in blood pressure after enalapril treatment, which has also been validated in other studies (J Gerontol A Biol Sci Med Sci, 2019, 74(8), 1149–1157). Therefore, our results are independent of changes in blood pressure.

    1. eLife Assessment

      This manuscript provides convincing evidence derived from diverse state-of-the-art approaches to suggest that non-dopaminergic projection neurons in the ventral tegmental area (VTA) make local synapses. These important findings challenge the prevailing wisdom that VTA interneurons exclusively form local synaptic contacts and instead reveal that VTA neurons expressing interneuron markers also form long-range projections to forebrain targets such as the cortex, ventral pallidum, and nucleus accumbens. Given the importance of VTA interneurons to many models of VTA-linked behavioral functions, these findings have significant implications for our understanding of the neural circuits underlying reward, motivation, and addiction.

    2. Reviewer #1 (Public review):

      The manuscript by Lucie Oriol et al. revisits the understanding of interneurons in the ventral tegmental area (VTA). The study challenges the traditional notion that VTA interneurons exclusively form local synapses within the VTA. Key findings of the study indicate that VTA GABA and glutamate projection neurons also make local synapses within the VTA. This evidence suggests that functions previously attributed to VTA interneurons could be mediated by these projection neurons.

      The study tested four genetic markers-Parvalbumin (PV), Somatostatin (SST), Mu-opioid receptor (MOR), and Neurotensin (NTS)-to determine if they selectively label VTA interneurons. The findings indicate that these markers label VTA projection neurons rather than selectively identifying interneurons. Using a combination of anatomical tracing and brain slice physiological recordings, the study demonstrates that VTA projection neurons make functional inhibitory or excitatory synapses locally within the VTA. These data challenge the conventional view that VTA GABA neurons are purely interneurons and suggests that inhibitory projection neurons can serve functions previously attributed to VTA interneurons. Thus, some functions traditionally ascribed to interneurons may be carried out by projection neurons with local synapses. This has significant implications for understanding the neural circuits underlying reward, motivation, and addiction.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, authors use a combination of transgenic animals, intersectional viruses, retrograde tracing, and ex-vivo slice electrophysiology to show that VTA projections neurons synapse locally. First, the authors injected a cre-dependent channelrhodopsin into the VTA of PV, SST, MOR, and NTS-Cre mice. Importantly, PV, SST, MOR, and NTS are molecular markers previously used to describe VTA interneurons. Imaging of known VTA target regions identified that these neurons are not localized to the VTA and instead project to the PFC, NAc, VP, and LHb. Next, the authors used an intersectional viral strategy to label projections neurons with both GFP (membrane localized) and Syn:Ruby (release sites). These experiments identified that VTA projection neurons also make intra-VTA synapses. Finally, the authors use a combination of optogenetics and ex-vivo slice electrophysiology to show that neurons projecting from the VTA to the NAc/VP/PFC also synapse locally. Overall, the conclusions are well supported by the data.

      Strengths:

      Previous literature has described Pvalb, Sst, Oprm1, and Nts as selective markers of VTA interneurons. Here, the authors make use of cre driver lines to show that neurons defined by these genes are not classically-defined interneurons and project to known VTA target regions. Additionally, the authors convincingly use intersectional viral approaches and slice electrophysiology to show that projection neurons synapse onto neighboring cells within the VTA

    4. Reviewer #3 (Public review):

      Summary:

      This study from Oriol et al. first uses transgenic animals to examine projection targets of specific subtypes of VTA GABA neurons (expressing PV, SST, MOR, or NTS). They follow this with a set of optogenetic experiments showing that VTA projection neurons (regardless of genetic subtype) make local functional connections within the VTA itself. Both of these findings are important advances in the field. Notably, both GABAergic and glutamatergic neurons in the VTA likely exhibit these combined long/short-range projections.

      Strengths:

      The main strength of this study is the series of optogenetic/electrophysiological experiments that provide detailed circuit connectivity of VTA neurons. The long-range projections to the VP (but not other targets) are also verified to have functional excitatory and inhibitory components. Overall, the experiments are well executed and the results are very relevant in light of the rapidly growing knowledge about the complexity and heterogeneity of VTA circuitry.

      Another strength of this study is the well-written and thoughtful discussion regarding the current findings in the context of the long-standing question of whether the VTA does or does not have true interneurons.

      Comments on revisions:

      The authors have addressed all of my questions admirably, and the final result is considerably improved and remains a valuable contribution to the field.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Regarding the manuscript's clarity, the sentence on page 5, "We also stained VTA sections for Tyrosine hydroxylase (TH) to estimate the rate of ChR2 colocalization with DA neurons," reads awkwardly. Removing the word "rate" could improve clarity.

      We have made the recommended clarifying edit (page 5, lines 30-31).

      Additionally, the anatomical data and findings are largely non-quantitative in nature. However, solid microscopy images are presented to support each claim. Additional quantification would strengthen the paper, specifically the quantification of projection density for each population and the proportion of each subpopulation that projects to their regions of interest.

      To rigorously quantify the projection density of each subpopulation would require a level of exhaustivity our study was not designed for. This is because during microscopy we focused efforts on imaging regions containing dense signals but did not exhaustively image regions receiving apparently weak or no input. While we considered including a semi-quantitative table of projection density, based on the data available we could not discriminate with confidence between, e.g., regions recipient of minimal input versus no input from VTA populations. Thus, while we stand by our descriptive statements we do not expand on those further.

      The authors should consider discussing the possibility that subpopulations of these cells could still be true interneurons especially if cells were looked at the single neuron level of resolution.

      We agree that some of the VTA populations we studied could include subpopulations that are bona fide interneurons. The identification of alternate markers or combinations of markers, or use of single-cell imaging approaches may indeed support this possibility in future. This is discussed in the context of currently available evidence on page 5 lines 32-34, page 11 lines 2-4, page 12 lines 2-11, and page 12 lines 15-16.

      Overall, the paper is well-written and important for the field and beyond.

      Thank you!

      Reviewer #2:

      Weaknesses:

      While the authors use several Cre driver lines to identify GABAergic projection neurons, they then use wild-type mice to show that projection neurons synapse onto neighboring cells within the VTA. This does not seem to lend evidence to the idea that previously described "interneurons" are projection neurons that collateralize within the VTA.

      We think the use of WT mice is a strength because it allows us to measure both GABA and non-GABA synapses made by VTA projections on to the same cells within VTA. However, we have also done this experiment targeting NAc-projecting VTA VGAT-Cre neurons, and VP-projecting VTA MOR-Cre neurons. Consistent with the WT dataset, we find that these defined projection neurons also make intra-VTA synapses. These data are now included as Figure 7.

      More broadly. Our review of the literature finds very little evidence to support the notion of a VTA interneuron as we define it: VTA neurons that makes only local connections. But the absence of evidence need not imply evidence of absence, thus we do not claim that all VTA neurons previously presumed to be interneurons must be projection neurons. We do express confidence in our findings that VTA projection neurons (that include GABA-releasing neurons) make local synapses in VTA. We argue that in the absence of compelling positive evidence for the existence of VTA interneurons, such as a selective marker, “we”, “the field”, should not presume their existence.

      Other suggestions:

      (1) While the authors present evidence that some projection neurons also synapse locally, there is no quantification as to the proportion of each neuronal subtype that collateralizes within the VTA. This would be a useful analysis.

      We agree this would be useful information. But our experiments were not designed to answer this question. Indeed, we have not conceived of a feasible method to discriminate between collateralizing and non-collateralizing VTA projection neurons at the single-cell level, thus we do not know how we would calculate such proportions.

      (2) There is significant interest in the molecular heterogeneity and spatial topography of the VTA. Additional analyses of the spatial topography of labeled projectors would be useful. For example, knowing if Pvalb+ projection neurons are distributed throughout the VTA or located along the midline would be a useful analysis.

      Prior studies and public databases (e.g., Allen brain atlas, GENSAT) allow one to visualize the location of VTA neurons positive for Pvalb and the other markers we investigated (Olson & Nestler, 2007). However, these label the entire population of neurons and thereby include those that project to any of the various projection targets. There are also studies that have used retrograde labeling approaches to map the distribution of labeled VTA cells projecting to one or another target (Beier et al., 2015; Lammel et al., 2008; Margolis et al., 2006). For example, finding that LHb-projecting neurons (a major target of Pvalb+ VTA neurons) are enriched in medial VTA (Root et al., 2014). From this evidence we might infer that Pvalb+ VTA neurons that project to LHb are likely to be medially biased. Future studies may more carefully map the intersection of specific projection targets for each VTA subpopulation.  

      Reviewer #3 (Recommendations For The Authors):

      Weaknesses:

      This study has a few modest shortcomings, of which the first is likely addressable with the authors' existing data, while the latter items will likely need to be deferred to future studies:

      (1) Some key anatomical details are difficult to discern from the images shown. In Figure 1, the low-magnification images of the VTA in the first column, while essential for seeing what overall section is being shown, are not of sufficient resolution to distinguish soma from processes. A supplemental figure with higher-resolution images could be helpful.

      We uploaded a higher resolution file for figure 1.

      Also, where are the insets shown in the second column obtained from? There is not a corresponding marked region on the low-magnification images. Is this an oversight, or are these insets obtained from other sections that are not shown?

      This was an oversight, we added the corresponding marked region to the low-magnification images.

      Lastly, there is a supplemental figure showing the NAc injection sites corresponding to Figure 5, but not one showing VP or PFC injection sites in Figure 6. Why not?

      We added a figure with histology examples for the VP and the PFC injection sites as done for Figure 5, included as Supplemental Figure 3.

      (2) Because multiple ChR2 neurons are activated in the optogenetic experiments, it is not clear how common is it for any specific projection neuron to make local connections. Are the observed synaptic effects driven by just a few neurons making extensive local collateralizations (while other projection neurons do not), or do most VTA projection neurons have local collaterals? I realize this is a complex question, that may not have an easy answer.

      This is a great question but, indeed, we don’t know the answer. As mentioned in response to Reviewer #2, we are not convinced there is a currently feasible way to discriminate between collateralizing and non-collateralizing cells at the single cell level.

      (3) There is something of a conceptual disconnect between the early and later portions of this paper. Whereas Figures 1-4 examine forebrain projections of genetic subtypes of VTA neurons, the optogenetic studies do not address genetic subtypes at all. I do realize that is outside of the scope of the author's intent, but it does give the impression of somewhat different (but related) studies being stitched together. For example, the MOR-expressing neurons seem to project strongly to the VP, but it is not addressed whether these are also the ones making local projections. Also, after showing that PV neurons project to the LHb, the opto experiments do not examine the LHb projection target at all.

      This too was raised by Reviewer #2. While addressing this question for all the populations we investigated feels redundant, we now include optogenetic data showing that NAc-projecting VTA VGAT-Cre and VP-projecting VTA MOR-Cre neurons also make local collaterals (Figure 7). We think this allows us to connect the two approaches to a greater degree. Based on our findings using a dual virus approach to express Syn:Ruby in each population of VTA projection neuron, we think it very likely that we’d continue to find similar results using optogenetics-assisted slice electrophysiology for each population.

      Other suggestions:

      (1) I appreciated the extensive and high-quality anatomical figures shown in Figures 2-4. However, the layout was sometimes left-to-right, and sometimes right-to-left, which felt distracting. At some point, the text refers to "Fig. 3KJ", i.e. with the letters being in backward alphabetical order, and Figures 3I and 3L do not appear mentioned anywhere in the main text, leading me to wonder if that text was intended to read "Fig. 3I-L".

      Thank you for noting this. We have harmonized the layout of Figures 2-4 and adjusted the in-text Figure call-outs.

      Also, the inset in Figure 3J appears to show local collaterals of NTS neurons in the VTA, since there is no soma in that inset. This is interesting, and worth reporting, but is not explained in either the main text or Figure legend.

      We added a more complete description in the result section (page 6 line 25-30).

      (2) Perhaps I missed it, but I could not find any mention of the intensity of the LED light delivered during the optogenetic experiments. While acknowledging that this can be variable, do the authors have at least a rough range?

      We have added this information to the methods, page 17 line 8.

      Editor's Note:

      Should you choose to revise your manuscript, please double check that you have fully reported all statistics including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals.

      We confirm that we have fully reported all statistics including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals.

      Note to Editor and Readers

      While reanalyzing our data for resubmission, we discovered that some of the short-latency optogenetic evoked postsynaptic currents (oPSCs) we detected were erroneously categorized. Specifically, some VTA cells that showed large outward currents (oIPSCs) when held at 0 mV, also had small inward currents when held at -60 mV. These small inward currents were initially categorized as oEPSCs, suggesting these VTA cells received input from populations of VTA projection neurons that released GABA and/or glutamate. However, the kinetics of these small inward currents were slow and aligned with the within-cell kinetics of the oIPSCs, indicating that these were very likely mediated by GABA<SUB>A</SUB> receptors. In one case the opposite was apparent, with a small PSC initially miscategorized as an oIPSC. These miscategorized oEPSCs and oIPSC were presumably detected because our holding potentials were not precisely identical to the reversal potentials for GABA<SUB>A</SUB> and AMPA receptors, respectively. For this reason, we removed these 14 oEPSCs and 1 oIPSCs from our analyses in the revised version. The revised dataset suggests that VTA glutamate projection neurons may be less likely to collateralize widely within VTA compared to GABA projection neurons. But, importantly, this correction does not affect any of our conclusions.

      Citations:

      Beier, K. T., Steinberg, E. E., DeLoach, K. E., Xie, S., Miyamichi, K., Schwarz, L., Gao, X. J., Kremer, E. J., Malenka, R. C., & Luo, L. (2015). Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell, 162(3), 622-634. https://doi.org/10.1016/j.cell.2015.07.015

      Lammel, S., Hetzel, A., Hackel, O., Jones, I., Liss, B., & Roeper, J. (2008). Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 57(5), 760-773. https://doi.org/10.1016/j.neuron.2008.01.022

      Margolis, E. B., Lock, H., Chefer, V. I., Shippenberg, T. S., Hjelmstad, G. O., & Fields, H. L. (2006). Kappa opioids selectively control dopaminergic neurons projecting to the prefrontal cortex. Proc Natl Acad Sci U S A, 103(8), 2938-2942. https://doi.org/10.1073/pnas.0511159103

      Olson, V. G., & Nestler, E. J. (2007). Topographical organization of GABAergic neurons within the ventral tegmental area of the rat. Synapse, 61(2), 87-95. https://doi.org/10.1002/syn.20345

      Root, D. H., Mejias-Aponte, C. A., Zhang, S., Wang, H. L., Hoffman, A. F., Lupica, C. R., & Morales, M. (2014). Single rodent mesohabenular axons release glutamate and GABA. Nat Neurosci, 17(11), 1543-1551. https://doi.org/10.1038/nn.3823

    1. eLife Assessment

      This work uses enhanced sampling molecular dynamics methods to generate potentially useful information about a conformational change (the DFG flip) that plays a key role in regulating kinase function and inhibitor binding. The focus of the work is on the mechanism of conformational change and how mutations affect the transition. The evidence supporting the conclusions is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used weighted ensemble enhanced sampling molecular dynamics (MD) to test the hypothesis that a double mutant of Abl favors the DFG-in state relative to the WT and therefore causes the drug resistance to imatinib.

      Strengths:

      The authors employed the state-of-the-art weighted ensemble MD simulations with three novel progress coordinates to explore the conformational changes the DFG motif of Abl kinase. The hypothesis regarding the double mutant's drug resistance is novel.

      Weaknesses:

      The study contains many uncertain aspects. A major revision is needed to strengthen the support for the conclusions.

      (1) Specifically, the authors need to define the DFG conformation using criteria accepted in the field, for example, see https://klifs.net/index.php.

      (2) Convergence needs to be demonstrated for estimating the population difference between different conformational states.

      (3) The DFG flip needs to be sampled several times to establish free energy difference.

      (4) The free energy plots do not appear to show an intermediate state as claimed.

      (5) The trajectory length of 7 ns in both Figure 2 and Figure 4 needs to be verified, as it is extremely short for a DFG flip that has a high free energy barrier.

      (6) The free energy scale (100 kT) appears to be one order of magnitude too large.

      (7) Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated.

      (8) Finally, the authors should discuss their work in the context of the enormous progress made in theoretical studies and mechanistic understanding of the conformational landscape of protein kinases in the last two decades, particularly with regard to the DFG flip.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well-written manuscript on the mechanism of the DFG flip in kinases. This conformational change is important for the toggling of kinases between active (DFG-in) and inactive (DFG-out) states. The relative probabilities of these two states are also an important determinant of the affinity of inhibitors for a kinase. However, it is an extremely slow/rare conformational change, making it difficult to capture in simulations. The authors show that weighted ensemble simulations can capture the DFG flip and then delve into the mechanism of this conformational change and the effects of mutations.

      Strengths:

      The DFG flip is very hard to capture in simulations. Showing that this can be done with relatively little simulation by using enhanced sampling is a valuable contribution. The manuscript gives a nice description of the background for non-experts.

      Weaknesses:

      I was disappointed by the anecdotal approach to presenting the results. Molecular processes are stochastic and the authors have expertise in describing such processes. However, they chose to put most statistical analysis in the SI. The main text instead describes the order of events in single "representative" trajectories. The main text makes it sound like these were most selected as they were continuous trajectories from the weighted ensemble simulations. I would much rather hear a description of the highest probability pathway(s) with some quantification of how probable they are. That would give the reader a clear sense of how representative the events described are.

      I appreciated the discussion of the strengths/weaknesses of weighted ensemble simulations. Am I correct that this method doesn't do anything to explicitly enhance sampling along orthogonal degrees of freedom? Maybe a point worth mentioning if so.

      I don't understand Figure 3C. Could the authors instead show structures corresponding to each of the states in 3B, and maybe also a representative structure for pathways 1 and 2?

      Why introduce S1 and DFG-inter? And why suppose that DFG-inter is what corresponds to the excited state seen by NMR?

      It would be nice to have error bars on the populations reported in Figure 3.

      I'm confused by the attempt to relate the relative probabilities of states to the 32 kca/mol barrier previously reported between the states. The barrier height should be related to the probability of a transition. The DFG-out state could be equiprobable with the DFG-in state and still have a 32 kcal/mol barrier separating them.

      How do the relative probabilities of the DFG-in/out states compare to experiments, like NMR?

      Do the staggered and concerted DFG flip pathways mentioned correspond to pathways 1 and 2 in Figure 3B, or is that a concept from previous literature?

    1. eLife Assessment

      This useful study provides a novel perspective on assessing the generalizability of meta-analytic findings by introducing prediction intervals (and distributions) as tools to evaluate whether future studies will likely yield non-zero effects. The methodology is generally solid, with a thorough exploration of a large set of published meta-analyses that broadens our understanding of between-study heterogeneity. However, some critical details are incomplete, requiring refinement to ensure statistical rigor.

    2. Joint Public Review:

      Summary:

      This study used a simulation approach with a large-scale compilation of published meta-analytic data sets to address the generalizability of meta-analyses. The authors used prediction interval/distribution as a central tool to evaluate whether future meta-analysis is likely to generate a non-zero effect.

      Strengths:

      Although the concept of prediction intervals is commonly taught in statistics courses, its application in meta-analysis remains relatively rare. The authors' creative use of this concept, combined with the decomposition of heterogeneity, provides a new perspective for meta-analysts to evaluate the generalizability of their findings. As such, I consider this to be a timely and practically valuable development.

      Weaknesses:

      First, in their re-analysis of the compiled meta-analytic data to assess generalizability, the authors used a hierarchical model with only the intercept as a fixed effect. In practice, many meta-analyses include moderators in their models. Ignoring these moderators could result in attributing heterogeneity to unexplained variation at the study or paper level, depending on whether the moderators vary across studies or papers. As a consequence, the prediction interval may be inaccurately wide or narrow, leading to an erroneous assessment of the generalizability of results derived from large meta-analytic data sets. A more accurate approach would be to include the same moderators as in the original meta-analyses and generate prediction intervals that reflect the effects of these moderators.

      Second, the authors used a t-distribution to generate the prediction intervals and distributions for the hierarchical meta-analysis model. While the t-distribution is exact for prediction intervals in linear models, it is not strictly appropriate for models with random effects. This discrepancy arises because the variances of random effects must be estimated from the data, and using a t-distribution for prediction intervals does not account for the uncertainty in estimating these variance components. Unless the data is perfectly balanced (i.e., all random effects are nested and sample sizes within each level of the random factor are equal), it is well established that t-distribution (or equivalently, F-distribution) based hypothesis testing and confidence/prediction intervals are typically anti-conservative. As recommended in the linear mixed models literature, bootstrapping methods or some form of degrees-of-freedom correction would be more appropriate for generating prediction intervals in this context.

      Finally, the authors define generalizability as the likelihood that a future study will yield a significantly non-zero effect. While this is certainly useful information, it is not necessarily the primary concern for many meta-analyses or individual studies. In fact, many studies aim to understand the mean response or effect within a specific context, rather than focusing on whether a future study will produce a significant result. For many research questions, the concern is not whether a future study will generate a significant finding, but whether the true mean response is different from zero. In this regard, the authors may have overstated the importance of knowing the outcome of a single future study, and framing this as the sole goal of research seems somewhat misguided.

    1. eLife Assessment

      This important study demonstrates the potential of synthetic gene circuits to detect and target aberrant RAS activity in cancer cell lines. The circuit design is novel and the evidence supporting the claims is convincing. As a proof-of-concept, this will be of broad interest. Testing the system with other KRAS mutations and clinically relevant output proteins, as well as gaining a better understanding of the underlying molecular mechanism, will both strengthen the study and help translate the technology toward clinical applications in cancer therapeutics.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Senn and colleagues presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS-expressing cells. This study aims to exploit these RAS-targeting circuits as cancer cell classifiers, enabling the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain, the RBDCRD domain of the RAS effector protein CRAF, is fused to the histidine kinase domain, which carries an inactivating amino acid exchange either in its ATP-binding site (N509A) or in its phosphorylation site (H399Q). Dimerization or nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL. The phosphorylated DNA-binding protein NarL, fused to the transcription activator domain VP48, binds its responsive element and induces the expression of the output protein. In comparison to mutated RAS, the effect of the RAS activator SOS-1 and the RAS inhibitor NF1 on the sensing ability as well as the tunability of the RAS sensor were examined. A RAS targeting circuit with an AND gate was designed by expressing the RAS sensor proteins under the control of defined MAPK response elements, resulting in a large increase in the dynamic range between mutant and wild-type RAS. Finally, the RAS targeting circuits were evaluated in detail in a set of twelve cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream.

      Strengths:

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines and to function, at least in part, as an RAS mutant cell classifier.

      Weaknesses:

      The use of an appropriate "therapeutic gene" might revert the oncogenic properties of RAS mutant cell lines. However, a therapeutic strategy based on this four-plasmid-based system might be difficult to implement in RAS-driven solid cancers.

    3. Reviewer #2 (Public review):

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision.

      Major comments:

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]).

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12D-responsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered.

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, binding-triggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text.

    4. Reviewer #3 (Public review):

      Summary:

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation.

      This approach is interesting. The design is novel and could be implemented for other RAS-mediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting aberrant RAS activity.

      Strengths:

      Novel circuit design, through optimization and characterization of the circuit components, solid data.

      Weaknesses:

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims.

    1. eLife Assessment

      This study presents a valuable nonlinear mathematical model that addresses how cell shape transitions in response to ECM stiffness. The evidence supporting the claims of the authors is solid, although additional work is required to improve the manuscript. For instance, the authors should improve the overall readability of the text and amend the experimental validation section. The work will be of interest to scientists working on a spectrum of fields including cell mechanics, cell behaviors, and cancer research.

    2. Reviewer #1 (Public review):

      The manuscript presents a novel nonlinear mathematical model that addresses a critical gap in our understanding of how cell shape transitions in response to ECM stiffness. The focus on the interplay between actomyosin contractility and ECM stiffening is highly relevant, especially in the context of cancer invasion and tissue morphogenesis. The originality of the proposed trizonal model is commendable, as it offers a comprehensive framework that could significantly advance the field.

      More specifically, the paper makes a significant contribution by providing a model that can predict multimodal cell shapes based on motility levels, which is a substantial improvement over current constitutive models. The potential to calibrate the model against experimental cell shape data is a strong point, as it ensures that the model's predictions are grounded in empirical evidence. The methodology appears to be rigorous and should provide reliable results when applied. This advancement could lead to a better understanding of the complex dynamics involved in cell-matrix interactions, particularly at intermediate ranges of collagen density. The potential applications of this research are vast and span across various medical and biological fields. The ability to predict cancer-induced tissue impairment, cachexia, and muscle injury, as well as to assess therapeutic methods, is particularly noteworthy. The mention of specific treatments like Blebbistatin and HAPLN1 treatments further adds depth to the discussion and highlights the practical relevance of the model.

      I'm curious if the authors could further elaborate on the use of this model to examine cellular unjamming transition or the cell shape changes during cancer invasion in various scenarios. Some discussions on that aspect will be helpful. It will also be useful to provide some perspectives on how this model could be integrated with others in a multi-scale modeling framework for understanding cell shape transitions during collective cell migration in various physiologically relevant scenarios.

      I recommend some minor revisions but overall, this is a very nice paper.

    3. Reviewer #2 (Public review):

      In this work the authors develop a mathematical model that incorporates three contributions to cellular force generation in 3D matrices: (1) actively generated contractile forces via myosin motors and consumption of ATP; (2) the energy stored in the extracellular matrix as it is deformed by the contractile cell; and (3), the energy associated with the interactions at the interface between the matrix and the cell, e.g. at focal adhesions. The authors make predictions about the dependence of cell shape on these three contributions.

      The authors succeed in making a number of predictions of how cell shapes will depend on these contributions to force generation. However, these predictions seem to be largely buried in the supplemental material and come in a form that will be accessible to a certain type of physicist and modeler but will likely not be accessible to many experimentalists who may want to test the predictions of the model. The authors show a comparison between their expected cell shape distributions and those predicted by the model, under multiple regimes: cells in two different concentrations of collagen (Figure 4c), cells with inhibited myosin and therefore reduced contractility (Figure 4d), cells with impaired interactions with the ECM (Figure 4e), and for cells with both contractility and ECM interactions impaired. They find a strong agreement between the experiments and their predictions. However, it should be noted that there are multiple "tuning parameters" in their model, so the ability to match experiment and theory may not be ultimately so surprising.

      While the authors do achieve their aim of building this modeling and testing it in comparison to experimental data, the text is frequently unclear and doesn't seem to have the right information at the right place and time to allow the reader to most clearly understand the motivation, the approach, or the results. A number of elements of this manuscript were confusing to this reviewer, and I discuss these below in the hopes that raising these points here can bring more clarity in future revisions, and/or that readers will be able to provide additional insight or attention to these questions.

      There are certain elements of the writing that obscure, rather than clarify, the model and the results. For example, the authors frequently refer to "matrix stiffening" and "strain stiffening", which are typically used in the literature to describe the phenomenon whereby an applied force changes the mechanical properties of the substrate; here, for example in regard to the discussion of Figure 4C, these terms instead seem to be simply referring to the experimental intervention of exposing different cells to different concentrations of the collagen matrix. While there may be some element of classically understood strain stiffening, incorporated into the model as the function f(λ_i), this doesn't seem to match the experimental validation - which, as described above, is not about strain stiffening but instead simply uses softer vs. stiffer gels. Therefore, it is unclear what exactly is meant throughout the manuscript by strain stiffening - does it mean "difference in stiffness between two conditions" or does it mean "change in substrate stiffness upon application of force"?

      Furthermore, while the introductory text emphasizes collective migration, the model itself focuses on the interactions between single cells and their environments. The emphasis on collective migration and cell shape in the introduction invokes previous literature focusing on collective phase transitions, but that is misleading. This paper is all about individual cell mechanics, not about collective migration or unjamming.

      The experimental validation seems to have a significant flaw. The mechanics and interactions of the cellular extensions seem to be completely ignored. We see, in Figure 4, that cell bodies are outlined to determine cell shape, but that the extremely long extensions are simply ignored. We know from previous studies that these extensions are generating quite a bit of traction and are contractile, and yet they've been excluded from the analysis. This doesn't make physical sense or fit with previous literature, and would seem to indicate that the regimes predicted by the model are missing an essential component of force generation and cell-matrix interaction.

    1. eLife Assessment

      This is an important and solid study that examines the role of TFAM, a protein that helps maintain mtDNA, in mtDNA mutator mice. The authors have demonstrated that TFAM's counteractive role in mtDNA mutator mice is tissue-specific. Minor revisions will enhance the clarity and discuss some of the findings for the broader audience.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression. Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM.

      Strengths:

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex.

      Weaknesses:

      No major weaknesses were noted. We have minor suggestions for improving the clarity of the manuscript that are detailed in the "recommendations for the authors" section.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions.

      Strengths:

      The data presented in the manuscript is of high quality and supports major conclusions.

      Weaknesses:

      The statistical methods used are not clearly described, and some marked non-significant results appear visually significant, which raises concerns about data analysis.

      Data presentation requires improvement.

    1. eLife Assessment

      The manuscript by Qi et al. provides valuable insights into the structural basis of RNA methylation by the METTL3-METTL14 complex, revealing a novel cryptic pocket critical for m6A recognition. The solid experimental approach, integrating crystallography, molecular simulations, and functional assays, supports a proposed two-step mechanism for enzymatic activity. Refining structural data and addressing binding kinetic inconsistencies would further enhance clarity and impact. This manuscript will interest researchers in RNA modification, cancer biology, and therapeutic drug development.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript submitted by Qi et al., the authors study the RNA methylation mechanism by the METTL3-METTL14 complex. This complex catalyzes the major epitranscriptome methylation mark of nuclear RNA, including mRNA and lncRNAs. They catalyze the transfer of methyl group from SAM to convert the N6 of adenosine in RNA to m6A. Mutations in this complex have been associated with several diseases, such as type 2 diabetes and several types of cancer. The primary focus of this study was to understand the post-catalytic state of the METTL3-14 bound to a structural mimic of a reaction product known as N6-methyladenosine monophosphate (m6A) using X-ray crystallography. The authors show that the m6A occupies a novel pocket at the interface of the METTL3-14 complex and identified that residues interacting with m6A are mutated in several cancers. Furthermore, the authors demonstrate that the mutations lead to a significant loss in catalytic activity, alter RNA binding, and hinder the proper positioning of the substrate adenine in the active site. Lastly, the authors perform supervised molecular dynamics simulations to understand the effect of the mutations on the interaction network with m6A. The evidence for this study is good, with the combination of X-ray, functional assays, and molecular dynamics justifying their overall conclusions. This structure is significant as it provides new insights into the structural determinants of known cancer-associated mutations of this important class of enzymes. However, some issues need to be addressed.

      Strengths:

      (1) The X-ray structure is well determined, and the density map has the quality to observe all the interactions of the METTL3-14 complex with m6A.

      (2) The structure reveals a novel 'cryptic pocket' in the complex that is 16 Å away from the SAM binding site. It is a functional m6A-sensor, illustrating a mechanism where the complex switches its functionality from an m6A writer to a reader.

      (3) The structure illustrates that the residues forming cryptic pockets are found in multiple Cancer-associated mutations and are well conserved across several organisms.

      (4) The functional assays (methyl transferase, RNA binding, kinetic, and SPR assays) provide a complete picture of the effect of the mutations on the activity of the METTL3-14 complex.

      (5) Molecular dynamics simulations were done to understand the impact of the mutations on the pocket structure and its dynamics and support the X-ray structure findings.

      Weaknesses:

      (1) Although the X-ray structure is well determined, the statistics are a bit troubling, particularly the Ramachandran, Sidechain and RSRZ outliers. It is well above the average for structures at that resolution. Maybe the use of alternative software such as ISOLDE may be adequate to improve those parameters.

      (2) The authors should expand their discussion as to why the affinity for the product is higher than the substrate and the implications on the mechanism.

      (3) The SPR profiles of the association kinetics look to have several minor association-dissociation events occurring. Multiple binding sites? Authors should provide an explanation for such behavior. Also, what is the structural explanation of the difference in binding modes between the wt vs. mutant (one vs. two-state binding modes)?

      (4) In materials and methods, it shows the data in Figure 2a was fitted to a Michaelis-Menten equation, however, the Y axis shows Normalized methylation and not initial rates. The authors should elaborate on their approach. In addition, more than three initial velocity rate points per protein are needed to fit a Michaelis-Menten curve confidently. Additionally, where can the Michaelis-Menten parameters be found?

    3. Reviewer #2 (Public review):

      Summary:

      Qi et al. determined the X-ray crystallographic structure of the methyltransferase core of the obligate heterodimeric complex METTL3-METTL14 in complex with methyladenosine monophosphate (m6A), a product mimic for the methylation of adenosine, to a resolution of 2.5 Å. Their structure appears to reveal a cryptic binding pocket for m6A that had not previously been identified. Using full-length protein produced in insect cells, Qi et al. determined the methyltransferase activity of wildtype METTL3-METTL14 and compared it to that of mutant forms of the protein that have been implicated in cancer. In addition to methyltransferase activity, the authors used both fluorescence polarization assays and surface plasmon resonance to investigate the affinities and kinetics of RNA binding to wildtype and mutant forms of the full-length complex. The results indicate that mutations in the methyltransferase core of two separate arginine residues alter the dynamics of RNA binding and enzyme specificity of METTL3-METTL14. The authors go on to use a combination of supervised molecular dynamics simulations and comparisons to recently published structures to propose a "swivelling" mechanism for the transfer of the methylated substrate from the catalytic site of the complex to the novel cryptic pocket.

      Strengths:

      I appreciated the inclusion of supplementary data showing the purity and monodispersity of the protein used for crystallization as well as the omit map and other electron density maps to support the placement of the product mimic in the cryptic site. The authors use a combination of complementary biophysical techniques to test the effects of mutations that were identified in the literature as being clinically important and to develop a hypothesis for the large-scale translocation required for the enzymatic product to move from the catalytic site to the cryptic pocket. The use of molecular dynamics simulations to attempt to indirectly visualize how this translocation might occur in vivo was well done.

      Weaknesses:

      Even taking into account the 2.5 Å resolution of the structure, the model is not refined to the point that it could be. Some waters seem to be built into blobs of density that aren't particularly convincing, and other seemingly obvious waters aren't built at all. The structure validation report supports this and shows that overall, and in the context of 2.5 Å resolution, this is not a great model. A good many parts of the structural analysis don't seem consistent with what I see when I look at the model and density in terms of proposed interactions in the cryptic pocket. Much of the language used in the manuscript is too strong when the model is quite speculative.

    1. eLife Assessment

      This study presents a valuable finding on the signaling mechanisms underlying Treg cell homeostasis by identifying the simultaneous requirement of diacylglycerol (DAG) kinases (DGK) alpha and zeta for Foxp3+ Treg cell function and follicular responses, with implications for the pathogenesis of some autoimmune diseases. Whereas data based on the characterization of double knock-out mice (for DGK alpha and zeta) is solid, showing the emergence of autoimmune manifestations, the study has gaps in its experimental approaches since it is not clear what can be attributed to the simultaneous DKGα and ζ deficiency, versus the individual deficiency of either one. Experiments on the pathogenic potential of the DKO Tregs in the absence of other T-cells were not presented and results on the role of CD25 downregulation and CD28-independent activation of Treg cells were not properly discussed. Nonetheless, the reported data would be of interest to immunologists working on T-cell intracellular signaling and autoimmunity.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Li and colleagues describes the impact of deficiency on the DKGα and ζ on Treg cells and follicular responses. The experimental approach is based on the characterization of double KO mice that show the emergence of autoimmune manifestations that include the production of autoantibodies. Additionally, there is an increase in Tfh cells, but also Tfr cells in these mice deficient in both DKGα and ζ. Although the observations are interesting, the interpretation of the observations is difficult in the absence of data related to single mutations. While a supplementary figure shows that the autoimmune manifestations are more severe in the DKGα and ζ deficient mice, prior observations show that a single DKGα deficiency has an impact on Treg homeostasis. As such, the contribution of the two chains to the overall phenotype is hard to establish.

      Strengths:

      Well-conducted experiments with informative mouse models with defined genetic defects.

      Weaknesses:

      The major weakness is the lack of clarity concerning what can be attributed to simultaneous DKGα and ζ deficiency versus deficiency on DKGα or ζ alone.

      Some interpretations are also not conclusively supported by data.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Li et al investigate the combined role of diacylglycerol (DAG) kinases (DGK) a and z in Foxp3+ Treg cells function that prevent autoimmunity. The authors generated DGK a and z Treg-specific double knockout mice (DKO) by crossing Dgkalpha-/- mice to DgKzf and Foxp3YFPCre/+ mice. The resulting "DKO" mice thus lack DGK a in all cells and DGK z in Foxp3+Treg cells. The authors show that the DKO mice spontaneously develop autoimmunity, characterized by multiorgan inflammatory infiltration and elevated anti-double-strand DNA (dsDNA), -single-strand DNA (ssDNA), and -nuclear autoantibodies. The authors attribute the DKO mice phenotype to Foxp3+Treg dysfunction, including accelerated conversion into "exTreg" cells with pathogenic activity. Interestingly, the combined deficiency of DGK a and z seems to release Treg cell dependence on CD28-mediated costimulatory signals, which the authors show by crossing their DKO mice to CD28-/- mice (TKO mice), which also develop autoimmunity.

      Strengths:

      The phenotypes of the mutant mice described in the manuscript are striking, and the authors provide a comprehensive analysis of the functional processes altered by the lack of DGKs.

      Weaknesses:

      One aspect that could be better explored is the direct role of "ex-Tregs" in causing pathogenesis in the models utilized.

      However, overall, this is an important report that makes a significant addition to the understanding of DAG kinases in Treg cell biology.

    1. eLife Assessment

      This study proposes a useful assay to identify relative social ranks in mice incorporating the competitive drive for two basic resources - food and living space. Using this new protocol, the authors provide solid evidence of stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. The evidence is, however, incomplete in providing ethologically based validation, assessment of the influence of competitor recognition, rigorous analysis of training data, and proof of concept of application to neuroscience. With these concerns addressed, this manuscript will be of interest to those interested in social behavior and related neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a new protocol to assess social dominance in pairs and triads of C57BL/6j mice, based on a competition to access a hidden food pellet. Using this new protocol, the authors have been able to identify stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. Ranking readouts identified with this new apparatus were compared to the outcomes obtained with the same animals competing in the tube and in the warm spot tests, which have been both commonly used during the last decade to identify social ranks in rodents under laboratory conditions.

      Strengths:

      FPCT allows for easy and fast identification of a winner and a loser in the context of food competition. The apparatus and the protocol are relatively easy and quick to implement in the lab and free from any complex post-processing/analysis, which qualifies it for wide distribution, particularly within laboratories that do not have the resources to implement more sophisticated protocols. Hierarchical readouts identified through the FPCT correlate with social ranks identified with the tube and the warm spot tests, which have been widely adopted during the last decade and allow for study comparison.

      Weaknesses:

      While the FPCT is validated by the tube and the warm spot test, this paper would have gained strength by providing a more ethologically based validation. Tube and warm spot tests have been shown to provide conflicting results and might not been a sufficient measurement for social ranking (see Varholik et al, Scientific reports, 2019; Battivelli et al, Biological psychiatry, 2024). Instead, a general consensus pushing toward more ethological approaches for neuroscience studies is emerging.<br /> Other papers already successfully identified social ranks dyadic food competition, using relatively simple scoring protocol (see for example Merlot et al., 2006), within a more naturalistic set-up, allowing the 2 opponents to directly interact while competing for the food. A potential issue with the FPCT, is that the opponents being isolated from each other, the normal inhibition expected to appear in subordinates in the presence of a dominant to access food, could be diminished, and usually avoiding subordinates could be more motivated to push for the access to the food pellet.

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      Open question:<br /> Is food restriction mandatory? Palatable food pellet is not sufficient to trigger competition? Food restriction has numerous behavioral and physiological consequences that would be better to prevent to be able to clearly interpret behavioral outcomes in FPCT (see for example Tucci et al., 2006).

      Conclusive remarks:<br /> Although this protocol attempts to provide a novel approach to evaluate social ranks in mice, it is not clear how it really brings a significant advance in neuroscience research. The FPCT dynamic is very similar to the one observed in the tube test, where mice compete to navigate forward in a narrow space, constraining the opponent to go backward. The main difference between the FPCT and the tube test is the presence of food between the opponents. In the tube test, a food reward was initially used to increase motivation to cross the tube and push the opponent upon the testing day. This component has been progressively abandoned, precisely because it was not necessary for the mice to compete in the tube.

      This paper would really bring a significant contribution to the field by providing a neuronal imaging or manipulation correlate to the behavioral outcome obtained by the application of the FPCT.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors have devised a novel assay to measure relative social rank in mice that is aimed at incorporating multiple aspects of social competition while minimizing direct contact between animals. Forming a hierarchy often involves complex social dynamics related to competitive drives for different fundamental resources including access to food, water, territory, and sexual mates. This makes the study of social dominance and its neural underpinnings hard, warranting the development of new tools and methods that can help understand both social functions as well as dysfunction.

      Strengths:

      This study showcases an assay called the Food Pellet Competition Test where cagemate mice compete for food, without direct contact, by pushing a block in a tube from opposite directions. The authors have attempted to quantify motivation to obtain the food independent of other factors such as age, weight, sex, etc. by running the assay under two conditions: one where the food is accessible and one where it isn't. This assay results in an impressive outcome consistency across days for females and males paired housed and for male groups of three. Further, the determined social ranks correlate strongly with two common assays: the tube test and the warm spot test.

      Weaknesses:

      This new assay has limited ethological validity since mice do not compete for food without touching each other with a block in the middle. In addition, the assay may only be valid for a single trial per day making its utility for recording neural recordings and manipulations limited to a single sample per mouse. Although the authors attempt to measure motivation as a factor driving who wins the social competition, the data is limited. This novel assay requires training across days with some mice reaching criteria before others. From the data reported, it is unclear what effects training can have on the outcome of social competition. Beyond the data shown, the language used throughout the manuscript and the rationale for the design of this novel assay is difficult to understand.

    4. Reviewer #3 (Public review):

      Summary:

      The laboratory mouse is an ideal animal to study the neural and psychological underpinnings of social dominance behavior because of its economic cost and the animals' readiness to display dominant and subordinate behaviors in simple and testable environments. Here, a new and novel method for measuring dominance and the individual social status of mice is presented using a food competition assay. Historically, food competition assays have been avoided because they occur in an open arena or the home cage, and it can be difficult to assess who gets priority access to the resource and to avoid aggressive interactions such as bite wounding. Now, the authors have designed a narrow rectangular arena separated in half by a sliding floor-to-ceiling obstacle, where the mice placed at opposite sides of the obstacle compete by pushing the obstacle to gain priority access to a food pellet resting on the arena floor under the obstacle. One can also place the food pellet within the obstacle to restrict priority access to the food and measure the time or effort spent pushing the obstacle back and forth. As hypothesized, the outcomes in the food competition test were significantly consistent with those of the more common tube test (space competition) and warm spot competition test. This suggests that these animals have a stereotypic dominance organization that exists across multiple resource domains (i.e., food, space, and temperature). Only male and female C57 mice in same-sex pairs or triads were tested.

      Strengths:

      The design of the apparatus and the inclusion of females are significant strengths within the study.

      Weaknesses:

      There are at least two major weaknesses of the study: neglecting the value of test inconsistency and not providing the mice time to recognize who they are competing with.

      Several studies have demonstrated that although inbred mice in laboratory housing share similar genetics and environment, they can form diverse types of hierarchical organizations (e.g., loose, stable, despotic, linear, etc.) and there are multiple resource domains in the home cage that mice compete over (e.g., space, food, water, temperature, etc.). The advantage of using multiple dominance assays is to understand the nuances of hierarchical organizations better. For example, some groups may have clear dominant and subordinate individuals when competing for food, but the individuals may "change or switch" social status when competing for space. Indeed, social relationships are dynamic, not static. Here, the authors have provided another test to measure another dimension of dominance: food competition. Rather than highlight this advantage, the authors highlight that the test is in agreement with the standard tube test and warm spot test and that C57 mice have stereotypic dominance across multiple domains. While some may find this great, it will leave many to continue using the tube test only (which measures the dimension of space competition) and avoid measuring food competition. If the reader looks at Figures 6E, F, and G they will see examples of inconsistency across the food competition test, tube test, and warm spot test in triads of mice. These groups are quite interesting and demonstrate the diversity of social dynamics in groups of inbred mice in highly standardized environmental conditions. Scientists interested in dominance should study groups that are consistent and inconsistent across multiple dimensions of dominance (e.g., space, food, mates, etc.).

      Unlike the tube test and warm spot test, the food competition test presented here provides no opportunity for the animals to identify their opponent. That is, they cannot sniff their opponent's fur or anogenital region, which would allow them an opportunity to identify them individually. Thus, as the authors state, the test only measures psychological motivation to get a food reward. Notably, the outcome in the direct and indirect testing of food competition is in agreement, leaving many to wonder whether they are measuring the social relationship or the effort an individual puts forth in attaining a food reward regardless of the social opponent. Specifically, in the direct test, an individual can retrieve the food reward by pushing the obstacle out of the way first. In the indirect test, the animals cannot retrieve the reward and can only push the obstacle back and forth, which contains the reward inside. In Figure 4E, you can see that winners spent more time pushing the block in the indirect test. Thus, whether the test measures a social relationship or just the likelihood of gaining priority access to food is unclear. To rectify this issue, the authors could provide an opportunity for the animals to interact before lowering the obstacle and raising(?) a food reward. They may also create a very long one-sided apparatus to measure the amount of effort an individual mouse puts forth in the indirect test with only one individual - or any situation with just one mouse where the moving obstacle is not pushed back, and the animal can just keep pushing until they stop. This would require another experiment. It also may not tell us much more since it remains unclear whether inbred mice can individually identify one another (see https://doi.org/10.1098/rspb.2000.1057 for more details).

      A minor issue is that the write-up of the history of food competition assays and female dominance research is inaccurate. Food competition assays have a long history since at least the 1950s and many people study female dominance now.

      Food competition: https://doi.org/10.1080/00223980.1950.9712776, https://psycnet.apa.org/fulltext/1953-03267-001.pdf, https://doi.org/10.1016/j.bbi.2003.11.007, https://doi.org/10.1038/s41586-022-04507-5

      Female dominance: https://doi.org/10.1016/0031-9384(87)90269-1, https://doi.org/10.1016/j.cub.2023.03.020, https://doi.org/10.1016/S0031-9384(01)00494-2, https://doi.org/10.1037/0735-7036.99.4.411

    1. eLife Assessment

      Veiga et al demonstrate the importance of incorporating RNAseq and machine learning approaches for neoantigen prediction. The evidence is convincing, and these findings contribute important information towards the selection of neoantigens for personalized antitumor vaccination.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of the study are trying to show that RNAseq can be used for neoantigen prediction and the machine learning approach to the prediction can reveal very useful information for the selection of neoantigens for personalized antitumor vaccination.

      Strengths:

      The authors demonstrated that RNA expression of a neoantigen is very important factor in the selection of peptides for the creation of personalized vaccines. They proved in vivo that in silico-predicted neoantigens can trigger antitumor response in mice.

      Weaknesses:

      The authors replied to my previous comment about the selection of the peptides for vaccination in the responses to reviewers, but didn't include that in the revised manuscript. I think all that information should be in the manuscript.<br /> Here is the original comment: "The selection of the peptides for vaccination is not clear. Some peptides were selected before and some after processing. What processing is also not clear. The authors didn't provide the full list of peptides before and after processing, please add those. And it wasn't clear that these peptides were previously published. Looking at the previously published table with peptide from B16 F10 (https://www.nature.com/articles/s41598-021-89927-5/tables/3), there are other genes with high expression, e.g. Tab2, Tm9sf3 that have higher expression than Herc6, please clarify the choice."

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reply to Reviewer #1 (Public Review):

      The post-processing increases number of putative neoantigens. As shown in Author response image 1, this is done through data augmentation or “mutations” of individual amino acids in a sequence by their most similar amino acid in the BLOSUM62 embedding. If most of the mutations result in a positive prediction (which we binarize through a >0.5 score) the sequence changes its prediction.

      Author response image 1.

      Post-processing pipeline to increase the number of putative neoantigens. Sequences can either be predicted using the forward method, for which a raw score is produced, or it can be introduced to a majority-vote prediction of the ensemble prediction of similar protein sequences.

      In this article, we obtain the following candidates after post-processing.

      Author response table 1.

      Sequence Symbol Gene Prediction FPKM

      As mentioned, the prediction column shows a binary label. The full list contained 402 sequences did not include any other sequences that met the majority vote criteria.

      As noted by the reviewer, the Table 3 of our original paper includes the scores of the direct prediction, which has four sequences in common with the post-processing criteria (*Pnp, *Adar, *Lrrc28 and *Nr1h2). * indicates the mutated form of the peptide, i.e neoantigen.

      We selected the top 4 predicted antigens (present both by direct prediction and after post-processing; (*Pnp, *Adar, *Lrrc28 and *Nr1h2) (Wert-Carvajal et al. 2021), but we encountered difficulty in synthesizing, *Nr1h2 (Mutated Nr1h2), and thus it could not be included in the study.

      We also decided to evaluate the immunogenicity of *Wiz, which was identified as a potential TNA only after postprocessing. *Wiz exhibited lower levels of immunogenicity compared to *Pnp, *Adar, and *Lrrc28. However, unlike these, *Wiz is highly expressed in the tumor, and vaccination with *Wiz provided the strongest protection levels. These findings led us to incorporate post-processingg into the NAPCNB platform.

      We chose *Herc6 as a mutated antigen predicted not to be a TNA over other candidates because its expression in the tumor was similar to that of *Wiz.

      Depending on the experiment we used 4 or 5 animals per group (this is now clarified in the revised version).

      The software used for statistical analysis was GraphPad Prism.

      Reply to Reviewer #2 (Public Review):

      This is true, binding affinity does not always predict immune responses but in most cases, high affinity peptides are immunogenic. There are of course other parameters that drive the effective priming of tumor-reactive CD8+ T cells through antigen crosspresentation, but the mechanisms of antigen presentation are yet not completely understood. High affinity peptides are desirable as good candidates in neoantigen-based vaccines.

      Other comments of the reviewers

      Reviewer #1 (Recommendations For The Authors):

      - Please decipher all abbreviations when they appear for the first time, e.g. NAP-CNB, PBS, CFA, FIA, and so on.  

      Done in the revised version.

      - Please be consistent with the capitalization of gene names (WIZ vs Wiz, TRP2 vs Trp2, and so on), and why there is an asterisk.

      Done in the revised version.

      - Please be clear about where you use cell lines or mice as a model. It's not clear.

      All work is done in mice, or cells isolated from vaccinated mice.

      - Why there is an asterisk in front of gene names?

      Explained in the revised version; The * indicates the peptides that are the mutated version.

      - Please add a reference for the following statement in the Introduction: "However, the response rates of these therapies remain low and relapses are common."

      Done in the revised version.

      - Also please add a reference for the use of TRP2 as a positive control.

      Done in the revised version.

      Reviewer #2 (Recommendations For The Authors):

      - It may be helpful to validate a larger pool of antigens. This is not necessary however and could be done in a follow-up study.

      We are doing it for other studies with excellent results.

      - The negative PBS control should be included in Figure 1.

      Done in the modified figure 1C in the revised version.

      - Stats should be clearly indicated in Figure 2.

      Done in the revised version.

      - Some nuances should be discussed. Is a threshold of neoantigen expression required or is there a correlation with tumor control? On the flip side, these neoantigens that are not likely to elicit immune responses but are highly expressed are also not likely to mediate tumor control.

      These points have been discussed. Based on our data, strategies for designing antitumor therapies should prioritize antigens that are highly expressed in tumors, even if they are not the most immunogenic. However, it is worth noting that even low-expressed antigens can still elicit an antitumor immune response. If possible one should define strategies attacking multiple antigens in order to minimize tumor scape. Whenever possible, strategies should be developed to target multiple antigens simultaneously, aiming to minimize tumor escape.

      - This study focuses on CD8 T cell responses but CD4s are also important in tumor control. This could be mentioned in the discussion.

      This is true, but this article focuses on validating a platform that predicts the antigenicity of antigens presented in the context of MHC-I.

      - Ideally, we would want to see that these responses are not elicited with adjuvant alone as an additional control.

      The non-vaccinated control animals received PBS and adjuvant. This clarification has now been included in the text.

    1. eLife Assessment

      By using molecular tools, electrophysiology, and ultrastructural reconstructions, this manuscript investigates the role of the Nogo/RTN4 receptor homolog RTN4RL2 at the afferent synapses between the sensory inner hair cells and spiral ganglion neurons and proposes that this regulates key aspects of hearing. The study is important because it provides insights into potential therapeutic targets for hearing loss related to synaptic dysfunction. The experimental data, based on the use of excellent tools, is solid and could be further improved with additional experiments that strengthen the validity of the findings and their interpretation, described in detail in the reviewers' comments.

    2. Reviewer #1 (Public review):

      Hearing and balance rely on specialized ribbon synapses that transmit sensory stimuli between hair cells and afferent neurons. Synaptic adhesion molecules that form and regulate transsynaptic interactions between inner hair cells (IHCs) and spiral ganglion neurons (SGNs) are crucial for maintaining auditory synaptic integrity and, consequently, for auditory signaling. Synaptic adhesion molecules such as neurexin-3 and neuroligin-1 and -3 have recently been shown to play vital roles in establishing and maintaining these synaptic connections ( doi: 10.1242/dev.202723 and DOI: 10.1016/j.isci.2022.104803). However, the full set of molecules required for synapse assembly remains unclear.

      Karagulan et al. highlight the critical role of the synaptic adhesion molecule RTN4RL2 in the development and function of auditory afferent synapses between IHCs and SGNs, particularly regarding how RTN4RL2 may influence synaptic integrity and receptor localization. Their study shows that deletion of RTN4RL2 in mice leads to enlarged presynaptic ribbons and smaller postsynaptic densities (PSDs) in SGNs, indicating that RTN4RL2 is vital for synaptic structure. Additionally, the presence of "orphan" PSDs-those not directly associated with IHCs-in RTN4RL2 knockout mice suggests a developmental defect in which some SGN neurites fail to form appropriate synaptic contacts, highlighting potential issues in synaptic pruning or guidance. The study also observed a depolarized shift in the activation of CaV1.3 calcium channels in IHCs, indicating altered presynaptic functionality that may lead to impaired neurotransmitter release. Furthermore, postsynaptic SGNs exhibited a deficiency in GluA2/3 AMPA receptor subunits, despite normal Gria2 mRNA levels, pointing to a disruption in receptor localization that could compromise synaptic transmission. Auditory brainstem responses showed increased sound thresholds in RTN4RL2 knockout mice, indicating impaired hearing related to these synaptic dysfunctions.

      The findings reported here significantly enhance our understanding of synaptic organization in the auditory system, particularly concerning the molecular mechanisms underlying IHC-SGN connectivity. The implications are far-reaching, as they not only inform auditory neuroscience but also provide insights into potential therapeutic targets for hearing loss related to synaptic dysfunction.

    3. Reviewer #2 (Public review):

      Summary:

      Kargulyan et al. investigate the function of the transsynaptic adhesion molecule RTN4RL2 in the formation and function of ribbon synapses between type I spiral ganglion neurons (SGNs) and inner hair cells. For this purpose, they study constitutive RTN4RL2 knock-out mice. Using immunohistochemistry, they reveal defects in the recruitment of protein to ribbon synapses in the knockouts. Serial block phase EM reveals defects in SGN projections in mutants. Electrophysiological recordings suggest a small but statistically significant depolarized shift in the activation of Cav1.3 Ca2+ channels. Auditory thresholds are also elevated in the mutant mice. The authors conclude that RTN4RL2 contributes to the formation and function of auditory afferent synapses to regulate auditory function.

      Strengths:

      The authors have excellent tools to analyze ribbon synapses.

      Weaknesses:

      However, there are several concerns that substantially reduce my enthusiasm for the study.

      (1) The analysis of the expression pattern of RTN4RL2 in Figure 1 is incomplete. The authors should show a developmental time course of expression up into maturity to correlate gene expression with major developmental milestones such as axon outgrowth, innervation, and refinement. This would allow the development of models supporting roles in axon outgrowth versus innervation or both.

      (2) It would be important to improve the RNAscope data. Controls should be provided for Figure 1B to show that no signal is observed in hair cells from knockouts. The authors apparently already have the sections because they analyzed gene expression in SGNs of the knock-outs (Figure 1C).

      (3) It is unclear from the immunolocalization data in Figure 1D if all type I SGNs express RTN4RL2. Quantification would be important to properly document the presence of RTN4RL2 in all or a subset of type I SGNs. If only a subset of SGNs express RTN4RL2, it could significantly affect the interpretation of the data. For example, SGNs selectively projecting to the pillar or modiolar side of hair cells could be affected. These synapses significantly differ in their properties.

      (4) It is important to show proper controls for the RTN4RL2 immunolocalization data to show that no staining is observed in knockouts.

      (5) The authors state in the discussion that no staining for RTN4RL2 was observed at synaptic sites. This is surprising. Did the authors stain multiple ages? Was there perhaps transient expression during development? Or in axons indicative of a role in outgrowth, not synapse formation?

      (6) In Figure 2 it seems that images in mutants are brighter compared to wildtypes. Are exposure times equivalent? Is this a consistent result?

      (7) The number of synaptic ribbons for wildtype in Figure 2 is at 10/IHCs, and in Figure 2 Supplementary Figure 2 at 20/IHCs (20 is more like what is normally reported in the literature). The value for mutant similarly drastically varies between the two figures. This is a significant concern, especially because most differences that are reported in synaptic parameters between wild-type and mutants are far below a 2-fold difference.

      (8) The authors report differences in ribbon volume between wild-type and mutant. Was there a difference between the modiolar/pillar region of hair cells? It is known that synaptic size varies across the modiolar-pillar axis. Maybe smaller synapses are preferentially lost?

      (9) The authors show in Figure 2 - Supplement 3 that GluA2/3 staining is absent in the mutants. Are GluA4 receptors upregulated? Otherwise, synaptic transmission should be abolished, which would be a dramatic phenotype. Antibodies are available to analyze GluA4 expression, the experiment is thus feasible. Did the authors carry out recordings from SGNs?

      (10) The authors use SBEM to analyze SGN projections and synapses. The data suggest that a significant number of SGNs are not connected to IHCs. A reconstruction in Figure 3 shows hair cells and axons. It is not clear how the outline of hair cells was derived, but this should be indicated. Also, is this a defect in the formation of synapses and subsequent retraction of SGN projections? Or could RTN4RL2 mutants have a defect in axonal outgrowth and guidance that secondarily affects synapses? To address this question, it would be useful to sparsely label SGNs in mutants, for example with AAV vectors expression GFP, and to trace the axons during development. This would allow us to distinguish between models of RTN4RL2 function. As it stands, it is not clear that RTN4RL2 acts directly at synapses.

      (11) The authors observe a tiny shift in the operation range of Ca2+ channels that has no effect on synaptic vesicle exocytosis. It seems very unlikely that this difference can explain the auditory phenotype of the mutant mice.

      (12) ABR recordings were conducted in whole-body knockouts. Effects on auditory thresholds could be a secondary consequence of perturbation along the auditory pathway. Conditional knockouts or precisely designed rescue experiments would go a long way to support the authors' hypothesis. I realize that this is a big ask and floxed mice might not be available to conduct the study.

    4. Reviewer #3 (Public review):

      In this study, the authors used RNAscope and immunostaining to confirm the expression of RTN4RL2 RNA and protein in hair cells and spiral ganglia. Through RTN4RL2 gene knockout mice, they demonstrated that the absence of RTN4RL2 leads to an increase in the size of presynaptic ribbons and a depolarized shift in the activation of calcium channels in inner hair cells. Additionally, they observed a reduction in GluA2/3 AMPA receptors in postsynaptic neurons and identified additional "orphan PSDs" not paired with presynaptic ribbons. These synaptic alterations ultimately resulted in an increased hearing threshold in mice, confirming that the RTN4RL2 gene is essential for normal hearing. These data are intriguing as they suggest that RTN4RL2 contributes to the proper formation and function of auditory afferent synapses and is critical for normal hearing. However, a thorough understanding of the known or postulated roles of RTN4Rl2 is lacking.

      While the conclusions of this paper are generally well supported by the data, several aspects of the data analysis warrant further clarification and expansion.

      (1) A quantitative assessment is necessary in Figure 1 when discussing RNA and protein expression. It would be beneficial to show that expression levels are quantitatively reduced in KO mice compared to wild-type mice. This suggestion also applies to Figure 2-supplement 3.D, which examines expression levels.

      (2) In Figure 2, the authors present a morphological analysis of synapses and discuss the presence of "orphan PSDs." I agree that Homer1 not juxtaposed with Ctbp2 is increased in KO mice compared to the control group. However, in quantifying this, they opted to measure the number of Homer1 juxtaposed with Ctbp2 rather than directly quantifying the number of Homer1 not juxtaposed with Ctbp2. Quantifying the number of Homer1 not juxtaposed with Ctbp2 would more clearly represent "orphan PSDs" and provide stronger support for the discussion surrounding their presence.

      (3) In Figure 2, Supplementary 3, the authors discuss GluA2/3 puncta reduction and note that Gria2 RNA expression remains unchanged. However, there is an issue with the lack of quantification for Gria2 RNA expression. Additionally, it is noted that RNA expression was measured at P4. While the timing for GluA2/3 puncta assessment is not specified, if it was assessed at 3 weeks old as in Figure 2's synaptic puncta analysis, it would be inappropriate to link Gria2 RNA expression with GluA2/3 protein expression at P4. If RNA and protein expression were assessed at P4, please indicate this timing for clarity.

      (4) In Figure 3, the authors indicate that RTN4RL2 deficiency reduces the number of type 1 SGNs connected to ribbons. Given that the number of ribbons remains unchanged (Figure 2), it is important to clearly explain the implications of this finding. It is already known that each type I SGN forms a single synaptic contact with a single IHC. The fact that the number of ribbons remains constant while additional "orphan PSDs" are present suggests that the overall number of SGNs might need to increase to account for these findings. An explanation addressing this would be helpful.

      (5) In Figure 4F and 5Cii, could you clarify how voltage sensitivity (k) was calculated? Additionally, please provide an explanation for the values presented in millivolts (mV).

      (6) In Figure 6, the author measured the threshold of ABR at 2-4 months old. Since previous figures confirming synaptic morphology and function were all conducted on 3-week-old mice, it would be better to measure ABR at 3 weeks of age if possible.

    1. eLife Assessment

      In this important study, the authors reconstruct the evolutionary history of a large and widespread group of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene, based on molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 living species. The authors convincingly infer that range expansions of the family were facilitated by tectonic connections, favorable climatic conditions, and orogenic processes, adding to our understanding of the effects of climatic change on biodiversity during the Cenozoic. This work is of interest to evolutionary biologists, ichthyologists, paleontologists, and general readers.

    2. Reviewer #2 (Public review):

      Summary:

      The authors present the results of molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 species, trying to give a holistic reconstruction of the evolutionary history of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene.

      Strengths:

      They provide very vast data and conduct comprehensive analysis. They suggested that Nemacheilidae contain 6 major clades, and the earliest differentiation can be dated to early Eocene.

      Weaknesses:

      They did not discuss the systematic problems widely existing, did not use the conventional way to discuss the evolutionary process of branches or clades, but just chronically describe the overall history.

      Comments on revisions:

      As the authors are aware that there are some taxonomic problems, which can not be solved at present. And they have mentioned this in the revised manuscript. I can not provide other suggestions at the moment.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is by far the phylogenetic analysis with the most comprehensive coverage for the Nemacheilidae family in Cobitoidea. It is a much-lauded effort. The conclusions derived using phylogenetic tools coincide with geological events, though not without difficulties (Africa pathway).

      Strengths:

      Comprehensive use of genetic tools

      Weaknesses:

      Lack of more fossil records

      Thank you for appreciating the comprehensiveness of our study.

      We agree that additional nemacheilid fossils would have provided valuable support for reconstructing the evolutionary history of the family. However, the nemacheilid fossil used in our study is currently the only fossil species of the family, which precludes the possibility of including more. To address this limitation, we incorporated fossils from closely related fish families, as well as a geological event, to calibrate the time tree. We have added further details on this point in “Divergence time estimations and ancestral range reconstruction” section of the Methods. The reconstruction of the pathway by which loaches reached northeast Africa, is further complicated by the extensive aridification of the Arabian Peninsula and the Nile valley, leaving no fossil or extant Nemacheilidae species of Nemacheilidae to provide insights into the distribution of the family during late Miocene.

      Reviewer #2 (Public review):

      Summary:

      The authors present the results of molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 species, trying to give a holistic reconstruction of the evolutionary history of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene. This is of great interest to general readers.

      Strengths:

      They provide very vast data and conduct comprehensive analyses. They suggested that Nemacheilidae contain 6 major clades, and the earliest differentiation can be dated to the early Eocene.

      Weaknesses:

      The analysis is incomplete, and the manuscript discussion is not well organized. The authors did not discuss the systematic problems that widely exist. They also did not use the conventional way to discuss the evolutionary process of branches or clades, but just chronologically described the overall history.

      In the revised version, we address the systematic issues within Nemacheilidae in a new paragraph. The polyphyly of the genus Schistura and the polyphyly or paraphyly of many other nemacheilid genera are wellknown challenges in ichthyology. However, the large size of the family Nemacheilidae and the absence of a clear basal classification system has made systematic work difficult.

      The chronological concept in the description of events is in accordance with the sequence in which the events occurred over time and corresponds with Figure 8. Additionally, a clade-by-clade description would make it challenging to capture the periods before all clades were formed. As a compromise, the revised version includes a new table where each clade is represented by a column, allowing readers to trace the history of each clade in a clear overview. With this table, we make both the chronological and clade-by-clade perspectives to enhance reader understanding

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no major comments, except for Figure 8, where the colour code for Sunda is not consistent, appearing as light purple and then dark purple. I was trying to locate the colour legend, maybe include this for all figures or refer to it.

      Figure 8 has been revised to improve matching of the colours.

      Reviewer #2 (Recommendations for the authors):

      (1) It is better to discuss the evolutionary history of the major inner groups. For example, why the Branch A and B differentiated? How are the 6 major clades differentiated?

      As mentioned above, the new table provides an overview of the evolutionary history of the major clades and, where known, the mechanism that led to their differentiation. For branches A and B, the underlying causes of differentiation remain known. Currently, the extensive morphological variability within each clade prevents a definitive morphological diagnosis, but such a study is planned for the future.

      (2) In this study, there are still some phylogenetic or systematic problems unresolved. For example, the Genus Schistura remains polyphyletic even in different major clades. The situation is similar for the Genus Tripophysa though not so serious. These need to be discussed or at least partially solved before discussing the evolutionary history.

      We discuss these topics now in a new paragraph ‘Taxonomic implications’.

      (3) In Table S1, what is the meaning of "-". Does this mean no data available? If so, how do the authors treat this in their phylogenetic analysis?

      Indeed, in Table S1, a ‘-‘ indicates that no sequence was available for the given species and gene. In the phylogenetic analyses, these cases were treated as missing data.

      (4) What is the source of Figure 8? There are different opinions on the geological events. The authors need to indicate the source of their information.

      The sources of Fig. 8 are now provided in the figure caption.

      (5) The Eastern Clade forms continuous distribution in Figure 6, but discontinuous in Figure 8. Is this correct?

      Figure 6 does not display the distribution areas for the clades, but illustrates the biogeographic regions used in the biogeographic analysis.

    1. eLife Assessment

      This important study investigates the signaling pathways regulating retinal regeneration. Convincing evidence shows that the sphingosine-1-phosphate (S1P) signaling pathway is inhibited following retinal injury. Small-molecule activators and inhibitors support a model in which S1P signaling must be inhibited to generate Müller glial progenitor cells-a key step in retinal regeneration. The presented results support the major conclusions. However, whether the drug treatments directly or indirectly affect the Müller cells remains unclear.

    2. Reviewer #1 (Public review):

      Summary:

      This study shows that the pro-inflammatory S1P signaling regulates the responses of muller glial cells to damage. The authors describe the expression of S1P signaling components. Using agonist and antagonist of the pathways they also investigate their effect on the de-differentiation and proliferation of Muller glial cells in damaged retina of postnatal chicks. They show that S1PR1 is highly expressed in resting MG and non-neurogenic MGPCs. This receptor suppresses the proliferation and neuronal activity promotes MGPC cell cycle re-entry and enhanced the number of regenerated amacrine-like cells after retinal damage. The formation of MGPCs in damaged retinas is impaired in the absence of microglial cells. This study further shows that ablation of microglial cells from the retina increases the expression of S1P-related genes in MG, whereas inhibition of S1PR1 and SPHK1 partially rescues the formation of MGPCs in damaged retinas depleted of microglia. The studies also show that expression of S1P-related genes is conserved in fish and human retinas.

      Strengths:

      This is well-conducted study, with convincing images and statistically relevant data

      Weaknesses:

      In a previous study, the authors have shown that S1P is upstream of NF-κB signaling (Palazzo et al. 2020; 2022, 2023). Although S1P and NF-κB signaling have overlapping effects, the authors here provide evidence for S1P specific effects, adding some new information to the field.

    3. Reviewer #2 (Public review):

      Summary:

      Sphingosine-1-phosphate (S1P) metabolic and signaling genes are expressed highly in retinal Müller glia (MG) cells. This study tested how S1P signaling regulates glial phenotype, dedifferentiation of, reprogramming into proliferating MG-derived progenitor cells (MGPCs), and neuronal differentiation of the progeny of MGPCs using in vivo chick retina. Major techniques used are Sc-RNASeq and immunohistochemistry to determine the gene expression and proliferation of MG cells that co-label with signaling antibodies or mRNA FISH following treating the in vivo eyes with various S1P signaling antagonists, agonists, and signal modulators. The major conclusions drawn are supported by the results presented. However, the methodology they have used to modulate the S1P pathway using various chemical drugs raises questions about the outcomes and whether those are the real effects of S1P receptor modulation or S1P synthesis inhibition.

      Strengths:

      - Use of elaborated single-cell RNAseq expression data.<br /> - Use of FISH for S1P receptors and kinase as a good quality antibody is not available.<br /> - Use of EdU assay in combination with IHC<br /> - Comparison with human and Zebrafish Sc-RNA data

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors):

      A good number of sentences in the introduction, page two, refer to a figure, 'Fig. 2a', which appears to be the copy-paste effect of these sentences from another location (please see below):

      "Notably, SPHK2 does not directly contribute to levels of secreted S1P (Thuy et al., 2022), nor is it annotated in the chick genome. S1P can be exported from cells by a transporter (MFSD2A and SPNS2) or converted to sphingosine by a phosphatase (SGPP1) (Fig. 2a). Levels of sphingosine are increased by ASAH1 by conversion of ceramide or decreased by CERS2/5/6 by conversion to ceramide (Fig. 2a). S1P is known to activate G-protein coupled receptors, S1PR1 through S1PR5 (Fig. 2a). S1PRs are known to activate different cell signaling pathways including MAPK and PI3K/mTor, and crosstalk with pro-inflammatory pathways such as NFκB (Fig. 2a) (Hu et al., 2020)."

      We have removed references to Fig. 2a, which was from a previous draft of this manuscript.

      Please correct the typo in the following sentence (Fid.)

      "S1PR1 was most prominently expressed by resting MG and MG returning to a resting state, whereas S1PR3 was detected in relatively few scattered cells in clusters of MG, ganglion cells, horizontal cells, bipolar cells, amacrine cells, photoreceptors, oligodendrocytes, microglia and NIRG cells (Fid. 1d).

      We have corrected this typo_._

    1. eLife Assessment

      This important study reports the developmental dynamics and molecular markers of the rete ovarii during ovarian development. The data supporting the main conclusions are convincing. This study will be of interest to developmental and reproductive biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Anbarcia et al. re-evaluates the function of the enigmatic Rete Ovarii (RO), a structure that forms in close association with the mammalian ovary. The RO has generally been considered a functionless structure in the adult ovary. This manuscript follows up on a previous study from the lab) that analyzed ovarian morphogenesis using high-resolution microscopy (McKey et al., 2022). The present study adds finer details to RO development and possible function by 1) identifying new markers for OR sub-regions (e.g. GFR1a labels the connecting rete) suggesting that the sub-regions are functionally distinct, 2) showing that the OR sub-regions are connected by a luminal system that allows transport of material from the extra-ovarian rete (EOR) to the inter-ovarian rete (IOG), 3) identifies proteins that are secreted into the OR lumen and that may regulate ovarian homeostasis, and finally, 4) better defines how the vasculature, nervous, and immune system integrates with the OR.

      Strengths:

      The data is beautifully present and convincing. They show that the RO is composed of three distinct domains that have unique gene expression signatures and thus likely are functionally distinct.

    3. Reviewer #2 (Public review):

      A large number of ovarian experiments have been conducted - especially in morphological and molecular biology studies - specifically removing the ovarian membrane. This experiment is a good supplement to existing knowledge and plays an important role in early ovarian development and the regulation of ovarian homeostasis during the estrous cycle. There are also innovations in research ideas and methods, which will meet the requirements of experimental design and provide inspiration for other researchers.

      Comments on revisions: I don't have any further opinions and suggest to accept.

    4. Reviewer #3 (Public review):

      Summary:

      The rete ovarii (RO) has long been disregarded as a non-functional structure within the ovary. In their study, Anbarci and colleagues have delineated the markers and developmental dynamics of three distinct regions of the RO - the intraovarian rete (IOR), the extraovarian rete (EOR), and the connecting rete (CR). Notably focusing on the EOR, the authors presented evidence illustrating that the EOR forms a convoluted tubular structure culminating in a dilated tip. Intriguingly, microinjections into this tip revealed luminal flow towards the ovary containing potentially secreted functional proteins. Additionally, the EOR cells exhibit associations with vasculature, macrophages, and neuronal projections, proposing the notion that the RO may play a functional role in ovarian development during critical ovariogenesis stages. By identifying marker genes within the RO, the authors have also suggested that the RO could serve as a potential structure linking the ovary with the neuronal system.

      Strengths:

      Overall, the reviewer commends the authors for their systematic research on the RO, shedding light on this overlooked structure in developing ovaries. Furthermore, the authors have proposed a series of hypotheses that are both captivating and scientifically significant, with the potential to reshape our understanding of ovarian development through future investigations.

      Weaknesses:

      Although the manuscript lacks conclusive data to support many of its conclusions, the authors provide highly constructive discussions that offer valuable insights for future research on the rete ovarii in the field.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses: 

      It is not always clear what the novel findings are that this manuscript is presenting. It appears to be largely similar to the analysis done by McKey et al. (2022) but with more time points and molecular markers. The novelty of the present study's findings needs to be better articulated. 

      The previous study focused on placing the Rete Ovarii in the context of ovarian development. The current study focuses on the novel findings that the EOR is a active structure that sends fluid/information to the ovary. We show this by characterizing the presence of secretory proteins in the RO epithelial cells, by dye injections into the EOR and observing transport of the dye to the ovary, and by collection of EOR fluid followed by proteomic analysis. We also show that RO is embedded in an elaborate vascular network and contacted by neurons. None of this data was not discussed in the McKey 2022 paper. 

      Reviewer #2 (Public Review):

      Clarifications: 

      (1) Is there any comparative data on the proteomics of RO and rete testis in early development? With some molecular markers also derived from rete testis, it would be better to provide the data or references.

      To the best of our knowledge, there are no available proteomic datasets of the embryonic or early postnatal mouse Rete Testis or Epididymis. The authors agree that having this information would be very useful. 

      (2) Although the size of RO and its components is quite small and difficult to operate, the researchers in this article had already been able to perform intracavitary injection of EOR and extract EOR or CR for mass spectrometry analysis. Therefore, can EOR, CR, or IOR be damaged or removed, providing further strong evidence of ovarian development function?

      We attempted to genetically ablate the RO by expressing the diphtheria toxin receptor (DTR) in RO cells and adding DT. This approach was not successful in ablating the RO. We also tried to use Pax2/8 homo- and heterozygous mutants for ablation (as used in the McKey 2022 paper), but so far, we cannot find a genetic combination that ablates the RO, but not the oviduct, uterus and/or kidneys. We have also embarked on a study to surgically remove the RO. This assay is taking some time to optimize. The goal of the current study was to characterize the cells along the length of the RO and to present evidence that it is a secretory appendage of the ovary.

      (3) Although IOR is shown on the schematic diagram, it cannot be observed in the immunohistochemistry pictures in Figure 1 and Figure 3. The authors should provide a detailed explanation.

      An annotation has been added to Figure 1 to indicate the IOR. As the images within the panels are of maximum intensity projections, it is often difficult to clearly see the IOR as it is deeper within the ovary. In Figure 3, the view of the ovary is from the ventral side:  this view does not allow for clear visualization of the IOR.

      Reviewer #3 (Public Review):

      Weaknesses: 

      There is a lack of conclusive data supporting many conclusions in the manuscript. Therefore, the paper's overall conclusions should be moderated until functional validations are conducted.

      We have moderated the conclusions where appropriate

      Reviewer #1 (Recommendations For The Authors):

      (1) The introduction is relatively brief and does not mention some historical data/hypotheses on the role of the RO in ovarian function (e.g. regulation of meiotic entry) or development (e.g. Mayère et al., 2022).

      Mayere 2022 was cited in line 57. Steins hypothesis about entry into meiosis has been added line 58.

      (2) L82-84: It is stated that KRT8 was first identified as a potential RO marker by sc/snRNAseq (Anbarci et al., 2023) and then validated in this manuscript. However, KRT8 was used by McKey et al. (2022) as a RO marker, and they noted there that KRT8 was enriched in the EOR. It is not clear why McKey et al. is not cited as the primary reference validating KRT8 as an EOR marker.

      The embryonic and neonatal timecourse description from KRT8 expression is first identified in this paper. McKey 2022 only highlights KRT8 at E18.5 A reference has been added to address this line 85

      (3) Figure 1: Can the IOR be seen in these images? If so, please label. 

      The label has been added.

      (4) L107: It is hypothesized that "the RO may respond to or interpret homeostatic cues." Can transcriptomics data shed light on what signals the RO may be capable of responding to? E.g. what receptors are expressed by cells of the RO (e.g. ER, LHCGR, FSHR)?

      The RO expresses ESR1, PGR, INSR, IGF1R. The IOR exclusively expresses LHCGR and FSHR.This has been added to the manuscript line 309

      (5) L152: Mass spec was used to identify proteins secreted into the lumen of the RO. These proteins were then compared to the mammalian secretome to filter out possible nonsecreted protein contaminants. Finally, the candidates were compared to the RO scRNAseq data from Anbarci et al., (2023). This method gives a very conservative candidate list. However, it may also be informative to compare the sc/snRNA-seq gene list directly to the secretome to ID other possible candidate-secreted proteins that may not have been detected in the mass spec data set. 

      There are quite a number of secreted proteins that are also not actively secreted. This is a good suggestion for future analysis. For the current study we wanted to take a more conservative approach, and chose to do proteomics to determine proteins that are actively secreted. 

      (6) L195: It is not clear if IGFBP2 is expressed by both OR and granulosa cells or only granulosa cells. It would be informative to know what ovarian cell types express both IGFBP2 and IGF1R (e.g. from sc/snRNA-seq)? This information is referenced in the discussion (L285-287) but would be better to reference it in the results section for clarity.

      Both RO and granulosa cells express IGFBP2 and IGF1R. A sentence has been added to results for clarity. (Line 197)

      (7) L295: "...the RO participates in endocrine signaling..." might be more accurate to say "...the RO responds to endocrine signaling...".

      The authors agreed that this statement is more accurate and the changes have been made. 

      Reviewer #3 (Recommendations For The Authors): 

      Several issues significantly affect the paper's quality in the current version. Firstly, there is a lack of conclusive data supporting many conclusions in the manuscript. For instance, the assertion in line 105 that "EOR was directly innervated by neurons" lacks substantial evidence beyond basic immunofluorescent staining. 

      We agree that the term “innervated” might be a step too far since we rely on IF evidence.  We changed the wording of this sentence to say, “The EOR was directly contacted by neurons”.

      In another pivotal experiment illustrated in Figure 3, the provided images lack temporal continuity and quantitative analysis, suggesting the incorporation of time-lapse imaging for improved sequential presentation in Figure 3.

      The microscope where we can perform injections cannot record movies.  We have tried moving the rete to another microscope after injection, but so far, we have been unable to capture dextran moving through the RO. We therefore believe that transport is rapid, but future experiments will be needed to optimize this imaging.

      Moreover, relying solely on proteomics analysis, as seen in lines 188-189, makes it challenging to assert conclusions such as "EOR actively secretes proteins." Therefore, the paper's overall conclusions should be moderated until functional validations are conducted. 

      The findings that (1) the cells of the EOR express SNARE complex proteins at their apical surfaces and (2) luminal fluid expelled from the EOR contains abundant secreted proteins strongly suggest that the RO is involved in active secretion. We use the word “suggest” in this sentence, lines 188-189 as we realize that further experiments should be done to validate this conclusion.

      Furthermore, the predominant methods in this study involve immunostaining and imaging. However, the current images exhibit a notable inconsistency in color definitions for different markers by the authors. For instance, in Figure 2.A/C, PAX8 is portrayed as cyan, while in D, it is represented in yellow. Similarly, in Figure 4, E-CAD is depicted using both cyan and yellow. Utilizing different colors for the same protein within a figure can significantly confuse readers' interpretation of the experiments. Rectifying these inconsistencies is essential to enhance the clarity and comprehension of the experimental results.

      These colors were chosen to be visible to those with color image impairments. We typically used cyan and magenta to emphasize the most important markers in the image. When E-Cad and KRT8 were often used to emphasized or landmark a structure by localization of these protein. When KRT8 and E-Cad were highlighted, they were represented in cyan and magenta for visibility. When these proteins were used as a landmark to orient the reader and not as the main point, they were labeled in yellow.

      At last, many markers in this study are derived from bulk and single-cell sequencing of developing RO. However, it seems that these important data were separated into another paper as a preprint. If this data were incorporated into the current manuscript, the manuscript would become more comprehensive for guiding future research on the RO.

      Since we have single cell and single nuclei data from fetal and adult estrus and metestrus stages, we found that incorporating all this data into the present manuscript was overwhelming. Instead, we devoted another manuscript to presenting and validating that data. We believe a quick look at the sequencing manuscript will make this clear.

    1. eLife Assessment

      This important study focused on characterizing clonally derived MSC populations from the synovium of normal and osteoarthritis (OA) patients, demonstrating their potential to regenerate cartilage in vivo. Although the strength of evidence is solid, further work is needed to fill the gaps in the CD47Hi cell characterization and the in vivo response assessment. The study will be of interest to scientists advancing MSC based regenerative medicine approaches.

    2. Reviewer #1 (Public review):

      Summary:

      This work by Al-Jezani et al. focused on characterizing clonally derived MSC populations from the synovium of normal and osteoarthritis (OA) patients. This included characterizing the cell surface marker expression in situ (at time of isolation), as well as after in vitro expansion. The group also tried to correlate marker expression with trilineage differential potential. They also tested the ability of the different sub-populations for their efficacy in repairing cartilage in a rat model of OA. The main finding of the study is that CD47hi MSCs may have a greater capacity to repair cartilage than CD47lo MSCs, suggesting that CD47 may be a novel marker of human MSCs that have enhanced chondrogenic potential.

      Strengths:

      Studies on cell characterization of the different clonal populations isolated indicate that the MSC are heterogenous and traditional cell surface markers for MSCs do not accurately predict the differentiation potential of MSCs. While this has been previously established in the field of MSC therapy, the authors did attempt to characterize clones derived from single cells, as well as evaluate the marker profile at the time of isolation. While the outcome of heterogeneity is not surprising, the methods used to isolate and characterize the cells were well developed. The interesting finding of the study is the identification of CD47 as a potential MSC marker that could be related to chondrogenic potential. The authors suggest that MSCs with high CD47 repaired cartilage more effectively than MSC with low CD47 in a rat OA model.

      Weaknesses:

      While the identification of CD47 as a novel MSC marker could be important to the field of cell therapy and cartilage regeneration, there was a lack of robust data to support the correlation of CD47 expression to chondrogenesis. The authors indicated that the proteomics suggested that the MSC subtype expressed significantly more CD47 than the non-MSC subtype. However, it was difficult to appreciate where this was shown. It would be helpful to clearly identify where in the figure this is shown, especially since it is the key result of the study. The authors were able to isolate CD47hi and CD47 low cells. While this is exciting, it was unclear how many cells could be isolated and whether they needed to be expanded before being used in vivo. Additional details for the CD47 studies would have strengthened the paper. Furthermore, the CD47hi cells were not thoroughly characterized in vitro, particularly for in vitro chondrogenesis. More importantly, the in vivo study where the CD47hi and CD47lo MSCs were injected into a rat model of OA lacked experimental details regarding how many cells were injected and how they were labeled. No representative histology was presented and there did not seem to be a statistically significant difference between the OARSI score of the saline injected and MSC injected groups. The repair tissue was stained for Sox9 expression, which is an important marker of chondrogenesis but does not show production of cartilage. Expression of Collagen Type II would be needed to more robustly claim that CD47 is a marker of MSCs with enhanced repair potential.

    3. Reviewer #2 (Public review):

      Summary:

      This is a compelling study that systematically characterized and identified clonal MSC populations derived from normal and osteoarthritis human synovium. There is immense growth in the focus on synovial-derived progenitors in the context of both disease mechanisms and potential treatment approaches, and the authors sought to understand the regenerative potential of synovial-derived MSCs.

      Strengths:

      This study has multiple strengths. MSC cultures were established from an impressive number of human subjects, and rigorous cell surface protein analyses were conducted, at both pre-culture and post-culture timepoints. In vivo experiments using a rat DMM model showed beneficial therapeutic effects of MSCs vs non-MSCs, with compelling data demonstrating that only "real" MSC clones incorporate into cartilage repair tissue and express Prg4. Proteomics analysis was performed to characterize non-MSC vs MSC cultures, and high CD47 expression was identified as a marker for MSC. Injection of CD47-Hi vs CD47-Low cells in the same rat DMM model also demonstrated beneficial effects, albeit only based on histology. A major strength of these studies is the direct translational opportunity for novel MSC-based therapeutic interventions, with high potential for a "personalized medicine" approach.

      Weaknesses:

      Weaknesses of this study include the rather cursory assessment of the OA phenotype in the rat model, confined entirely to histology (i.e. no microCT, no pain/behavioral assessments, no molecular readouts). It is somewhat unclear how the authors converged on CD47 vs the other factors identified in the proteomics screen, and additional information is needed to understand whether true MSCs only engraft in articular cartilage or also in ectopic cartilage (in the context of osteophyte/chondrophyte formation). Some additional discussion and potential follow-up analyses focused on other cell surface markers recently described to identify synovial progenitors is also warranted. A conceptual weakness is the lack of discussion or consideration of the multiple recent studies demonstrating that DPP4+ PI16+ CD34+ stromal cells (i.e. the "universal fibroblasts") act as progenitors in all mesenchymal tissues, and their involvement in the joint is actively being investigated. Thus, it seems important to understand how the MSCs of the present study are related to these DPP4+ progenitors. Despite these areas for improvement, this is a strong paper with a high degree of rigor, and the results are compelling, timely, and important.

      Overall, the authors achieved their aims, and the results support not just the therapeutic value of clonally-isolated synovial MSCs but also the immense heterogeneity in stromal cell populations (containing true MSCs and non-MSCs) that must be investigated further. Of note, the authors employed the ISCT criteria to characterize MSCs, with mixed results in pre-culture and post-culture assessments. This work is likely to have a long-term impact on methodologies used to culture and study MSCs, in addition to advancing the field's knowledge about how synovial-derived progenitors contribute to cartilage repair in vivo.

    4. Author response:

      We appreciate the reviewers’ thoughtful and constructive feedback, which has provided valuable insights to refine our manuscript. Below, we outline the planned revisions in response to the public reviews.

      Response to Reviewer #1

      We are grateful for the reviewer’s recognition of our methodological approach and the potential significance of CD47 as a novel MSC marker for cartilage repair. To address the concerns raised:

      (1) Clarifying the proteomics data supporting CD47 as an MSC marker

      · The manuscript will be revised to clearly indicate where the proteomics data demonstrate elevated CD47 expression in MSCs compared to non-MSCs.

      · Additional figure annotations or a supplemental figure may be included to enhance clarity.

      (2) Providing further details on CD47hi and CD47lo MSC populations

      · Information on the number of isolated CD47hi and CD47lo cells, along with any necessary expansion steps before in vivo use, will be explicitly detailed.

      (3) Expanding the characterization of CD47hi MSCs in vitro

      · A more comprehensive analysis of the chondrogenic differentiation capacity of CD47hi MSCs will be incorporated to strengthen the findings.

      (4) Clarifying experimental details of the in vivo rat OA model

      · The methodology section will be updated to specify the number of injected cells and their labeling strategies.

      · Representative histological images will be added to support the results.

      · To further substantiate the cartilage repair potential of CD47hi MSCs, additional staining for Collagen Type II will be included alongside Sox9 expression.

      Response to Reviewer #2

      We appreciate the reviewer’s enthusiasm for the study and recognition of its rigor and translational significance. The following revisions are planned to address the feedback:

      (1) Addressing additional assessments for OA phenotype in the rat model

      · While this study primarily relied on histology, the limitations of this approach will be acknowledged in the discussion.

      · The absence of microCT and behavioral assessments will be explained, with suggestions for incorporating these methods in future studies.

      (2) Justifying the focus on CD47

      · The rationale behind prioritizing CD47 over other proteomics-identified markers will be expanded to provide better context for this choice.

      (3) Clarifying MSC engraftment patterns

      · The manuscript will include a discussion on whether CD47hi MSCs specifically engraft in articular cartilage or contribute to ectopic cartilage formation (e.g., osteophytes).

      (4) Contextualizing findings within recent research on synovial progenitors

      · Additional discussion will highlight recent studies on DPP4+ PI16+ CD34+ stromal cells and how the identified MSC populations may relate to these universal fibroblasts.

      We are confident that these revisions will strengthen the manuscript and enhance its clarity and impact. The reviewers’ insights have been invaluable, and we look forward to refining the study accordingly.

    1. eLife Assessment

      This valuable study presents a mouse gastruloid model that can be used to generate hematopoietic progenitors as well as leukemic cells. However, in its current form, the manuscript is inadequate because the primary claims are not supported. Overall, the hematopoietic progenitor cells generated in this system need to be better defined.

    2. Reviewer #1 (Public review):

      Summary

      The authors describe a method for gastruloid formation using mouse embryonic stem cells (mESCs) to study YS and AGM-like hematopoietic differentiation. They characterise the gastruloids during nine days of differentiation using a number of techniques including flow cytometry and single-cell RNA sequencing. They compare their findings to a published data set derived from E10-11.5 mouse AGM. At d9, gastruloids were transplanted under the adrenal gland capsule of immunocompromised mice to look for the development of cells capable of engrafting the mouse bone marrow. The authors then applied the gastruloid protocol to study overexpression of Mnx1 which causes infant AML in humans.

      In the introduction, the authors define their interpretation of the different waves of hematopoiesis that occur during development. 'The subsequent wave, known as definitive, produces: first, oligopotent erythro-myeloid progenitors (EMPs) in the YS (E8-E8.5); and later myelo-lymphoid progenitors (MLPs - E9.5-E10), multipotent progenitors (MPPs - E10-E11.5), and hematopoietic stem cells (HSCs - E10.5-E11.5), in the aorta-gonad-mesonephros (AGM) region of the embryo proper.' Herein they designate the yolk sac-derived wave of EMP hematopoiesis as definitive, according to convention, although paradoxically it does not develop from intra-embryonic mesoderm or give rise to HSCs.

      General comments

      The authors make the following claims in the paper:

      (1) The development of a protocol for hemogenic gastruloids (hGx) that recapitulates YS and AGM-like waves of blood from HE.

      (2) The protocol recapitulates both YS and EMP-MPP embryonic blood development 'with spatial and temporal accuracy'.

      (3) The protocol generates HSC precursors capable of short-term engraftment in an adrenal niche.

      (4) Overexpression of MNX1 in hGx transforms YS EMP to 'recapitulate patient transcriptional signatures'.

      (5) hGx is a model to study normal and leukaemic embryonic hematopoiesis.

      There are major concerns with the manuscript. The statements and claims made by the authors are not supported by the data presented, data is overinterpreted, and the conclusions cannot be justified. Furthermore, the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis.

      Specific points

      (1) It is claimed that HGxs capture cellularity and topography of developmental blood formation. The hGx protocol described in the manuscript is a modification of a previously published gastruloid protocol (Rossi et al 2022). The rationale for the protocol modifications is not fully explained or justified. There is a lack of novelty in the presented protocol as the only modifications appear to be the inclusion of Activin A and an extension of the differentiation period from 7 to 9 days of culture. No direct comparison has been made between the two versions of gastruloid differentiation to justify the changes.

      The inclusion of Activin A at high concentration at the beginning of differentiation would be expected to pattern endoderm rather than mesoderm. BMP signaling is required to induce Flk1+ mesoderm, even in the presence of Wnt. FACS analysis of the hGx during differentiation is needed to demonstrate the co-expression of Flk1-GFP and lineage markers such as CD34 to indicate patterning of endothelium from Flk1+ mesoderm. The FACS plots in Figure 1 show c-Kit expression but very little VE-cadherin which suggests that CD34 is not induced. Early endoderm expresses c-Kit, CXCR4, and Epcam but not CD34 which could account for the lack of vascular structures within the hGx as shown in Figure 1E.

      (2) The protocol has been incompletely characterised, and the authors have not shown how they can distinguish between either wave of Yolk Sac (YS) hematopoiesis (primitive erythroid/macrophage and erythro-myeloid EMP) or between YS and intraembryonic Aorta-Gonad-Mesonephros (AGM) hematopoiesis. No evidence of germ layer specification has been presented to confirm gastruloid formation, organisation, and functional ability to mimic early development. Furthermore, differentiation of YS primitive and YS EMP stages of development in vitro should result in the efficient generation of CD34+ endothelial and hematopoietic cells. There is no flow cytometry analysis showing the kinetics of CD34 cell generation during differentiation. Benchmarking the hGx against developing mouse YS and embryo data sets would be an important verification.

      Single-cell RNA sequencing was used to compare hGx with mouse AGM. The authors incorrectly conclude that ' ..specification of endothelial and HE cells in hGx follows with time-dependent developmental progression into putative AGM-like HE..' And, '...HE-projected hGx cells.......expressed Gata2 but not Runx1, Myb, or Gfi1b..' Hemogenic endothelium is defined by the expression of Runx1 and Gfli1b is downstream of Runx1.

      (3) The hGx protocol 'generates hematopoietic SC precursors capable of short-term engraftment' is not supported by the data presented. Short-term engraftment would be confirmed by flow cytometric detection of hematopoietic cells within the recipient bone marrow, spleen, thymus, and peripheral blood that expressed the BFP transgene. This analysis was not provided. PCR detection of transcripts, following an unspecified number of amplification cycles, as shown in Figure 3G (incorrectly referred to as Figure 3F in the legend) is not acceptable evidence for engraftment. Transplanted hGx formed teratoma-like structures, with hematopoietic cells present at the site of transplant only analysed histologically. Indeed, the quality of the images provided does not provide convincing validation that donor-derived hematopoietic cells were present in the grafts.

      There is no justification for the authors' conclusion that '... the data suggest that 216h hGx generate AGM-like pre-HSC capable of at least short-term multilineage engraftment upon maturation...'. Indeed, this statement is in conflict with previous studies demonstrating that pre-HSCs in the dorsal aorta of the mouse embryo are immature and actually incapable of engraftment.

      The statement '...low-level production of engrafting cells recapitulates their rarity in vivo, in agreement with the embryo-like qualities of the gastruloid system....' is incorrect. Firstly, no evidence has been provided to show the hGx has formed a dorsal aorta facsimile capable of generating cells with engrafting capacity. Secondly, although engrafting cells are rare in the AGM, approximately one per embryo, they are capable of robust and extensive engraftment upon transplantation.

      (4) Expression MNX1 transcript and protein in hematopoietic cells in MNX1 rearranged acute myeloid leukaemia (AML) is one cause of AML in infants. In the hGX model of this disease, Mnx1 is overexpressed in the mESCs that are used to form gastruloids. Mnx1 overexpression seems to confer an overall growth advantage on the hGx and increase the serial replating capacity of the small number of hematopoietic cells that are generated. The inefficiency with which the hGx model generates hematopoietic cells makes it difficult to model this disease. The poor quality of the cytospin images prevents accurate identification of cells. The statement that the kit-expressing cells represent leukemic blast cells is not sufficiently validated to support this conclusion. What other stem cell genes are expressed? Surface kit expression also marks mast cells, frequently seen in clonogenic assays of blood cells. Flow cytometric and gene expression analyses using known markers would be required.

      (5) In human infant MNX1 AML, the mutation is thought to arise at the fetal liver stage of development. There is no evidence that this developmental stage is mimicked in the hGx model.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors develop an exciting new hemogenic gastruloid (hGX) system, which they claim reproduces the sequential generation of various blood cell types. The key advantage of this cellular system would be its potential to more accurately recapitulate the spatiotemporal emergence of hematopoietic progenitors within their physiological niche compared to other available in vitro systems. The authors present a large set of data and also validate their new system in the context of investigating infant leukemia.

      Strengths:

      The development of this new in vitro system for generating hematopoietic cells is innovative and addresses a significant drawback of current in vitro models. The authors present a substantial dataset to characterize this system, and they also validate its application in the context of investigating infant leukemia.

      Weaknesses:

      The thorough characterization and full demonstration that the cells produced truly represent distinct waves of hematopoietic progenitors are incomplete. The data presented to support the generation of late yolk sac (YS) progenitors, such as lymphoid cells, and aortic-gonad-mesonephros (AGM)-like progenitors, including pre-hematopoietic stem cells (pre-HSCs), by this system are not entirely convincing. Given that this is likely the manuscript's most crucial claim, it warrants further scrutiny and direct experimental validation. Ideally, the identity of these progenitors should be further demonstrated by directly assessing their ability to differentiate into lymphoid cells or fully functional HSCs. Instead, the authors primarily rely on scRNA-seq data and a very limited set of markers (e.g., Ikzf1 and Mllt3) to infer the identity and functionality of these cells. Many of these markers are shared among various types of blood progenitors, and only a well-defined combination of markers could offer some assurance of the lymphoid and pre-HSC nature of these cells, although this would still be limited in the absence of functional assays.

      The identification of a pre-HSC-like CD45⁺CD41⁻/lo c-Kit⁺VE-Cadherin⁺ cell population is presented as evidence supporting the generation of pre-HSCs by this system, but this claim is questionable. This FACS profile may also be present in progenitors generated in the yolk sac such as early erythro-myeloid progenitors (EMPs). It is only within the AGM context, and in conjunction with further functional assays demonstrating the ability of these cells to differentiate into HSCs and contribute to long-term repopulation, that this profile could be strongly associated with pre-HSCs. In the absence of such data, the cells exhibiting this profile in the current system cannot be conclusively identified as true pre-HSCs.

      The engraftment data presented are also not fully convincing, as the observed repopulation is very limited and evaluated only at 4 weeks post-transplantation. The cells detected after 4 weeks could represent the progeny of EMPs that have been shown to provide transient repopulation rather than true HSCs.

    4. Reviewer #3 (Public review):

      In this study, the authors employ a mouse ES-derived "hemogenic gastruloid" model which they generated and which they claim to be able to deconvolute YS and AGM stages of blood production in vitro. This work could represent a valuable resource for the field. However, in general, I find the conclusions in this manuscript poorly supported by the data presented. Importantly, it isn't clear what exactly are the "YS" and the "AGM"-like stages identified in the culture and where is the data that backs up this claim. In my opinion, the data in this manuscript lack convincing evidence that can enable us to identify what kind of hematopoietic progenitor cells are generated in this system. Therefore, the statement that "our study has positioned the MNX1-OE target cell within the YS-EMP stage (line 540)" is not supported by the evidence presented in this study. Overall, the system seems to be very preliminary and requires further optimization before those claims can be made.

      Specific comments below:

      (1) The flow cytometric analysis of gastruloids presented in Figure 1 C-D is puzzling. There is a large % of c-Kit+ cells generated, but few VE-Cad+ Kit+ double positive cells. Similarly, there are many CD41+ cells, but very few CD45+ cells, which one would expect to appear toward the end of the differentiation process if blood cells are actually generated. It would be useful to present this analysis as consecutive gating (i.e. evaluating CD41 and CD45 within VE-Cad+ Kit+ cells, especially if the authors think that the presence of VE-Cad+ Kit+ cells is suggestive of EHT). The quantification presented in D is misleading as the scale of each graph is different.

      (2) The imaging presented in Figure 1E is very unconvincing. C-Kit and CD45 signals appear as speckles and not as membrane/cell surfaces as they should. This experiment should be repeated and nuclear stain (i.e. DAPI) should be included.

      (3) Overall, I am not convinced that hematopoietic cells are consistently generated in these organoids. The authors should sort hematopoietic cells and perform May-Grunwald Giemsa stainings as they did in Figure 6 to confirm the nature of the blood cells generated.

      (4) The scRNAseq in Figure 2 is very difficult to interpret. Specific points related to this:<br /> - Cluster annotation in Figure 2a is missing and should be included.<br /> - Why do the heatmaps show the expression of genes within sorted cells? Couldn't the authors show expression within clusters of hematopoietic cells as identified transcriptionally (which ones are they? See previous point)? Gene names are illegible.<br /> - I see no expression of Hlf or Myb in CD45+ cells (Figure 2G). Hlf is not expressed by any of the populations examined (panels E, F, G). This suggests no MPP or pre-HSC are generated in the culture, contrary to what is stated in lines 242-245. (PMID 31076455 and 34589491).<br /> Later on, it is again stated that "hGx cells... lacked detection of HSC genes like Hlf, Gfi1, or Hoxa9" (lines 281-283). To me, this is proof of the absence of AGM-like hematopoiesis generated in those gastruloids.

      (5) Mapping of scRNA-Seq data onto the dataset by Thambyrajah et al. is not proof of the generation of AGM HE. The dataset they are mapping to only contains AGM cells, therefore cells do not have the option to map onto something that is not AGM. The authors should try mapping to other publicly available datasets also including YS cells.

      (6) Conclusions in Figure 3, named "hGx specify cells with preHSC characteristics" are not supported by the data presented here. Again, I am not convinced that hematopoietic cells can be efficiently generated in this system, and certainly not HSCs or pre-HSCs.<br /> - FACS analysis in 3A is again very unconvincing. I do not think the population identified as c-Kit+ CD144+ is real. Also, why not try gating the other way around, as commonly done (e.g. VE-Cad+ Kit+ and then CD41/CD45)?<br /> - The authors must have tried really hard, but the lack of short- or long-engraftment in a number of immunodeficient mouse models (lines 305-313) really suggests that no blood progenitors are generated in their system. I am not familiar with the adrenal gland transplant system, but it seems like a very non-physiological system for trying to assess the maturation of putative pre-HSCs. The data supporting the engraftment of these mice, essentially seen only by PCR and in some cases with a very low threshold for detection, are very weak, and again unconvincing. It is stated that "BFP engraftment of the Spl and BM by flow cytometry was very low level albeit consistently above control (Fig. S4E)" (lines 337-338). I do not think that two dots in a dot plot can be presented as evidence of engraftment.

      (7) Given the above, I find that the foundations needed for extracting meaningful data from the system when perturbed are very shaky at best. Nevertheless, the authors proceed to overexpress MNX1 by LV transduction, a system previously shown to transform fetal liver cells, mimicking the effect of the t(7;12) AML-associated translocation. Comments on this section:<br /> - The increase in the size of the organoid when MNX1 is expressed is a very unspecific finding and not necessarily an indication of any hematopoietic effect of MNX1 OE.<br /> - The mild increase of cKit+ cells (Figure 4E) at the 144hr timepoint and the lack of any changes in CD41+ or CD45+ cells suggests that the increase in Kit+ cells % is not due to any hematopoietic effect of MNX1 OE. No hematopoietic GO categories are seen in RNA seq analysis, which supports this interpretation. Could it be that just endothelial cells are being generated?

      (8) There seems to be a relatively convincing increase in replating potential upon MNX1-OE, but this experiment has been poorly characterized. What type of colonies are generated? What exactly is the "proportion of colony forming cells" in Figures 5B-D? The colony increase is accompanied by an increase in Kit+ cells; however, the flow cytometry analysis has not been quantified.

      (9) Do hGx cells engraft upon MNX1-OE? This experiment, which appears not to have been performed, is essential to conclude that leukemic transformation has occurred.