27,105 Matching Annotations
  1. Last 7 days
    1. Author response:

      Puvlic Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by preexisting epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway. 

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions. 

      Strengths: 

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition. 

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn. 

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity. 

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful. 

      We thank the reviewer for providing constructive feedback on the manuscript.

      Weaknesses: 

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses. 

      We fully agree with the reviewer’s suggestion. The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Functional validation is indeed the focus of our current studies, where we are carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation. 

      Agreed. Please see our response to point #1 above.  

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes. 

      We have previously shown that the first event in the pMHCII-NP-induced TFH-TR1 transdifferentiation process involves proliferation of cognate TFH cells in the splenic germinal centers. This event is followed by immediate conversion of the proliferated TFH cells into transitional and terminally differentiated TR1 subsets. Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the TFH-TR1 cell pathway upon the termination of treatment, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (Sole et al., 2023a). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFHTR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein. 

      We will revise the manuscript accordingly, to address the three concerns raised by the reviewer, in the context of the ongoing studies mentioned above. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes. 

      Strengths: 

      (1) A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing. 

      (2) They performed correlation analysis to answer the association between "pMHC-NPinduced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NPinduced TR1 cells. This will serve as a valuable reference for future research. 

      We thank the reviewer for his/her constructive feedback and suggestions for improvement of the manuscript.

      Weaknesses: 

      (1) A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section. 

      We agree that this study focuses on a very specific, previously unrecognized pathway discovered in mice treated with pMHCII-NPs. Despite this apparent narrow perspective, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Furthermore, this pathway affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported here can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area. We will discuss the limitations and opportunities that this research provides more explicitly in a revised manuscript to provide a clearer context for the scope and applicability of our findings.

      We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLH-induced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (Sole et al., 2023a). However, we note that scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells). This will be clarified in a revised version of the manuscript.

      (2) This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      We appreciate your suggestion to discuss differences in histone modification intensities as measured by ChIP-seq. However, we respectfully disagree with the reviewer’s interpretation of these data.

      Our study primarily focuses on the identification of epigenetic similarities and differences between pMHCII-NP-induced tetramer+ cells and KLH-induced TFH cells relative to naive T cells. The outcome of direct comparisons of histone deposition (ChIP-seq) between these cell types is summarized in the lower part of Figure 4B and detailed in Datasheet 5. Throughout this section, we report the number of differentially enriched regions, their overlap with OCRs shared between tetramer+ TFH and tetramer+ TR1 cells based on scATAC-seq data, and the associated genes. Clearly, most of the epigenetic modifications that TR1 cells inherit from TFH cells had already been acquired by TFH cells upon differentiation from naïve T cell precursors. 

      Regarding the specific point raised by the reviewer on differences in the intensity of the H3K27Ac peaks linked to Il10 in Figure 6C, we note that the genomic tracks shown are illustrative. However, thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for H3K27Ac deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLHinduced TFH cells. 

      We acknowledge that peak calling alone does not account for intensity variations of histone modifications. However, our analysis includes both qualitative and quantitative assessments to ensure robust conclusions. We will edit the relevant sections of the manuscript to clarify these points and better communicate our methodology and findings to the readers.

      (3) Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents. 

      We understand this reviewer’s concern about the potential redundancy of some results and figures. The goal of including these analyses is to provide a comprehensive understanding of the intricate relationships between epigenetic features and transcriptomic differences. We believe that a detailed examination of these relationships is crucial for several reasons: (i) the breadth of the data allows for a thorough exploration of the relationships between histone marks, chromatin accessibility and transcriptional differences. This comprehensive approach helps ensure that our conclusions are robust and well-supported by the data; (ii) some of the results that may appear confirmatory are, in fact, important for validating and reinforcing the consistency of our findings across different contexts. These details intend to provide a nuanced understanding of the interactions between epigenetic features and gene expression; and (iii) by presenting a detailed analysis, we aim to offer a solid foundation for future research in this area. The extensive datasets that are presented in this paper will serve as a valuable resource for others in the field who may seek to build upon our findings.

      That said, we will carefully review the manuscript to identify and streamline any elements that may be overly redundant. We will consider consolidating figures and refining the text to ensure that the paper remains concise and focused while retaining the depth of analysis that we believe is essential.

    2. eLife assessment

      This study provides important information on pre-existing epigenetic modification in T cell plasticity. The evidence supporting the conclusions is compelling, supported by comprehensive transcriptional and epigenetic analyses. The work will be of interest to immunologists and colleagues studying transcriptional regulation.

    3. Reviewer #1 (Public Review):

      Summary:

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by pre-existing epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway.

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions.

      Strengths:

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition.

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn.

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity.

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful.

      Weaknesses:

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses.

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation.

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes.

    4. Reviewer #2 (Public Review):

      Summary:

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes.

      Strengths:

      A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing.

      They performed correlation analysis to answer the association between "pMHC-NP-induced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NP-induced TR1 cells. This will serve as a valuable reference for future research.

      Weaknesses:

      A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section.

      This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents.

    1. eLife assessment

      This study employed a comprehensive approach to examining how the MT+ region integrates into a complex cognition system in mediating human visuo-spatial intelligence. While the findings are useful, the experimental evidence is incomplete and the study designs, hypotheses, and data analyses need to be improved. The work will be of interest to researchers in psychology, cognitive science, and neuroscience.

    2. Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      Major:

      (1) In Melnick (2013) IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III? I am wondering whether other subtests were conducted and, if so, please include the results as well to have comprehensive comparisons with Melnick (2013).

      Minor:

      (1) Table 1 and Table supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table supplementary 2??

      (2) Line 27, it is unclear to me what is "the canonical theory".

      (3) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      (4) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      (5) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      (6) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      (7) Although the correlations are not significant in Figure supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      (8) There is no need to separate different correlation figures into Figure supplementary 1-4. They can be combined into the same figure.

      (9) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      (10) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      (11) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      (12) Figure 5 is too small. The items in plot a and b can be barely visible.

    3. Reviewer #3 (Public Review):

      (1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity? In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      (2) There is an obvious mismatch between the in-text description and the content of the figure:

      "In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 1a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 1b)."

      (3) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate. Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      (4) "Small threshold" and "large threshold" are neither standard descriptions, and it is unclear what "small threshold" refers to in the following figure caption. Additionally, the unit (ms) is confusing. Does it refer to timing?

      "(f) Peason's correlation showing significant negative correlations between BDT and small threshold."

      (5) In the response letter, the authors mentioned incorporating the neural efficiency hypothesis in the Introduction, but the revised Introduction does not contain such information.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something?

      Thank you to the reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated through the frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we have adjusted the explanatory logic of the article. Briefly, we emphasize the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weaken the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient?

      Thank you to reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has reasonable power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we have made a correlation matrix to reporting all values in Figure Supplementary 9.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We have made such figures in the revised version (Figure 3f, g).

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavioral model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within the behavioral model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We have revised the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we ensured a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we maintained ‘Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We have revised the Figure 1a and made it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ contributes to 3D visuo-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thank you for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thank you for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex. This supports our choice and emphasizes the relevance of hMT+ in our study. We have revised our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for the reviewer’s suggestion. We have placed it in the main text (Figure 3e).

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for the reviewer’s suggestion. We have drawn the V1 ROI MRS scanning area (Figure supplement 1). Using the template, we checked the coverage of V1, V2, and V3. Although the MRS overlap regions extend to V2 (3%) and V3 (32%), the major coverage of the MRS scanning area is in V1, with 65% overlap across subjects.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for the reviewer’s suggestion. We have done the V1 FC-behavior connection as control analysis (Figure supplement 7). Only positive correlations in the frontal area were detected, suggesting that in the 3D visuo-spatial intelligence task, V1 plays a role in feedforward information processing. However, hMT+, which showed specific negative correlations in the frontal, is involved in the inhibition mechanism. These results further emphasize the de-redundancy function of hMT+ in 3D visuo-spatial intelligence.

      Regarding the mediation analysis, since GABA/Glu concentration in V1 has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank the reviewer for pointing this out. We have further interpreted the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D visuo-spatial intelligence. In addition, we have revised Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms, on the psychological level, function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D visuo-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank the reviewer for pointing this out. We realized that such expression would lead to confusion. We have deleted this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank the reviewer for pointing this out. We have attached the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank the reviewer for pointing this out. We have revised it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank the reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank the reviewer for pointing this out. We have revised it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank the reviewer for pointing this out. We have revised it.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The figures and tables should be substantially improved.

      We thank the reviewer for pointing this out. We have improved some of the figures’ quality.

      (2) Please explain the sample size, and the difference between Schallmo eLife 2018, and Melnick, 2013.

      We thank the reviewer for pointing this out. These questions are answered in the public review. We copy the answer in the public review.

      (2.1)  How was the sample size determined? Is it sufficient??

      Thank you to the reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 subjects to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (2.2)  In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank you to the reviewer for pointing this out. There are several differences between the two studies, ours and theirs:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are described in review 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (3) Table 1 and Table Supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table Supplementary 2?

      (3.1) what are the main points of these values?

      Thank you to the reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to the BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlapping regions between two tasks indicate a shared underlying mechanism.

      (3.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight (bold font) indicating the significant correlations survived from multi correlation correction.

      (3.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (4) Line 27, it is unclear to me what is "the canonical theory".

      We thank the reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion”.

      (5) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank the reviewer for pointing this out. We have revised them and used "hMT+" to be consistent with the human fMRI literature.

      (6) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank the reviewer for pointing this out. We have included the total number of subjects in the beginning of result section.

      (7) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank the reviewer for pointing this out. We have deleted the inappropriate sentence "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area".

      (8) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank the reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (9) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank the reviewer for pointing this out. We have revised them.

      (10) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank the reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. The correlation figures in Supplementary Figure 1 indicate that GABA in V1 does not show any correlation with BDT and SI, illustrating that inhibition in V1 is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 2 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or V1. Supplementary Figure 3 validates our MRS measurements. Supplementary Figure 4 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (11) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank the reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly.

      (12) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank the reviewer for pointing this out. We have included some brief description of task at the beginning of the result section.

      (13) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank the reviewer for the suggestion. We have included these results in Figure 3.

      (14) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank the reviewer for pointing this out. We increase the size and resolution of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      I highly recommend editing the manuscript for readability and the use of the English language. I had significant difficulties following the rationale of the research due to issues with the way language was used.

      We thank the reviewer for pointing this out. We apologize for any shortcomings in our initial presentation. We have invited a native English speaker to revise our manuscript.

    1. Reviewer #1 (Public Review):

      In this revised manuscript, authors have conducted epigenetic and transcriptomic profiling to understand how environmental chemicals such as BPS can cause epimutations that can propagate to future generations. They used isolated somatic cells from mice (Sertoli, granulosa), pluripotent cells to model preimplantation embryos (iPSCs) and cells to model the germline (PGCLCs). This enabled them to model sequential steps in germline development, and when/how epimutations occur. The major findings were that BPS induced unique epimutations in each cell type, albeit with qualitative and quantitative cell-specific differences; that these epimutations are prevalent in regions associated with estrogen-response elements (EREs); and that epimutations induced in iPSCs are corrected as they differentiate into PGCLCs, concomitant with the emergence of de novo epimutations. This study will be useful in understanding the multigenerational effects of EDCs, and underlying mechanisms.

      Strengths include:

      (1) Using different cell types representing life stages of epigenetic programming and during which exposures to EDCs have different effects. This progression revealed information both about the correction of epimutations and the emergence of new ones in PGCLCs.

      (2) Work conducted by exposing iPSCs to BPS or vehicle, then differentiating to PGCLCs, revealed that novel epimutations emerged.

      (3) Relating epimutations to promoter and enhancer regions

      During the review process, authors improved the manuscript through better organization, clarifying previous points from reviewers, and providing additional data.

    2. Reviewer #2 (Public Review):

      Summary:

      This manuscript uses cell lines representative of germ line cells, somatic cells and pluripotent cells to address the question of how the endocrine disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters.

      Strengths:

      The strengths of the paper include the use of various cell types to address sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation.

      Weaknesses:

      The weakness, which has been addressed by the authors, includes the fact that exposures are more complicated in a whole organism than in an isolated cell line.

    3. eLife assessment

      This important study, characterizing the epigenetic and transcriptomic response of a variety of cell types representative of somatic, germline, and pluripotent cells to BPS, reveals the cell type-specific changes in DNA methylation and the relationship with the genome sequence. The findings are convincing and provide a basis for future analyses in vivo. This work should be of interest to biomedical researchers who work on epigenetic reprogramming and epigenetic inheritance.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing editor’s list of items remaining to be addressed followed by our responses/actions:

      (1) The order and organization of supplemental figures and tables is almost impossible to navigate. Please put them in order. 

      All the sections from the previous Supplementary files have been divided into individual Supplementary files so that each can be referenced without confusion from the text. All of the references in the body of the text and the author responses have been updated to reflect this change.

      (2) The question of sample sizes was partially addressed, with authors stating that cell culture work in iPSCs and PGCLCs was done in replicates of 3. Sertoli and granulosa cells were generated from pooled preps - how many individuals, were they littermates? 

      Sertoli and granulosa primary cultures were generated from littermates and each prep used 5 animals (males for Sertoli cells and females for granulosa cells). These changes have been added to the body of the text on pages 39 and 40.

      (3) Authors need to discuss the limitations of doing work in triplicates. Their PCA (Supplement Figure 9) reveals that in several cases samples from the same treatment were not discriminated by PC1 and/or PC2. This is especially true in e and f, the variance of which was explained by PC1 for cell type, but for which treatments showed poor discrimination by PC2. Some discussion of the limitations of sample size should be provided.

      Additional text has been added to what is now Supplementary file 15 to acknowledge this limitation imposed by the limited number of replicates (three) and the ability to resolve the differences in treatments by PCA in subplots e and f. However, we also note that the differences were sufficient to identify significant DMCs/DMRs/DEGs.

      Reviwer 2 also noted a potential weakness that “exposures are more complicated in a whole organism than in an isolated cell line.”

      We note that in our revised manuscript we included wording noting that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      Reviewer #1 (Public Review): 

      Critiques/Comments: 

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this. 

      [Addressed on pages: 24-25] – We have added two sentences to the second paragraph of the Discussion section in which we now acknowledge this concern, but also point out that in vitro models of this sort also provide an experimental advantage in that they facilitate a deconvolution of the extensive complexity resident within the intact animal. Nevertheless, we acknowledge that this deconvolution requires ultimate validation of findings obtained within an in vitro model system to ensure they accurately recapitulate functions that occur in the intact animal in vivo.

      In response to Reviewer 2’s stated weakness of our study that “The weakness includes the fact that exposures are more complicated in a whole organism than in an isolated cell line,” please note that this added text includes the statement that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.  

      Addressed on pages: 39-45 and in new Supplementary file 15 – Additional text has been added in the Methods section to indicate that all samples involving cell culture models which include iPSCs and PGCLCs came from a single XY iPS cell line aliquoted into replicates and all primary cultures which included Sertoli and granulosa cells were generated from pooled tissue preps from mice and then aliquoted into replicates. Finally, all experiments in the study were performed on three replicates. Because this experimental design did indeed allow for comparisons among samples, we have added a new Supplementary file 15

      which displays PCA plots showing clustering among control and treatment datasets, respectively, as well as distinctions between each cluster representing each experimental condition.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?  

      Addressed on pages: 6-7 and in new Supplementary files 1-3  – The experiment shown in Figure 1 was designed to 1) serve as proof of principle that cells maintained in culture could be susceptible to EDC-induced epimutagenesis at all, 2) determine if any response observed would be dose-dependent, and 3) identify a minimally effective dose of BPS to be used for the remaining experiments in this study (which we identified as 1 μM). We agree that it is interesting that the 50 µM dose of BPS induced predominantly hypermethylation changes whereas the 1 µM and 100 µM doses induced predominantly hypomethylation changes, but are not in a position to offer a mechanistic explanation for this outcome at this time. As the results shown satisfied our primary objectives of demonstrating that exposure of cells in culture to BPS could indeed induce DNA methylation epimutations, that this occurs in a dose-dependent manner, and that a dose of as low as 1 µM of BPS was sufficient to induce epimutagenesis, the data obtained satisfied all of the initial objectives of this experiment. That said, in response to the reviewer’s request we have now added text on pages 6-7 alluding to new Supplementary files 1-3 indicating the total number of DMCs and DMRs, as well as the number of DEGs, detected in response to exposure to each dose of BPS shown in Figure 1, as well as stratifying those results to indicate the numbers of hyper- and hypomethylation epimutations and up- and down-regulated DEGs induced in response to each dose of BPS. While, as noted above, investigating the mechanistic basis for the difference in responses induced by the 50 µM versus 1 and 100 µM doses of BPS was beyond the scope of the study presented in this manuscript, we do find this result reminiscent of the “U-shaped” response curves often observed in toxicology studies. Importantly, this result does demonstrate the elevated resolution and specificity of analysis facilitated by our in vitro cell culture model system.

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.  

      Addressed on pages: 6-7 and new Supplementary files 1-6 – In general, we observed a coincidence between changes in DNA methylation and changes in gene expression (Supplementary files 1-3). Pertaining directly to the reviewer’s question about the extent to which we observed common DMRs and DEGs across all doses, while we only found 3 overlapping DMRs conserved across all doses tested, we did find an average of 51.25% overlap in DMCs and an average of 80.45% overlap in DEGs across iPSCs exposed to the different doses of BPS shown in Figure 1. In addition, within each dose of BPS tested in iPSCs, we also found that there was an overlap between DMCs and the promoters or gene bodies of many DEGs (Supplementary file 5). Specifically within gene promoters, we observed a correlation between hypermethylated DMCs and decreased gene expression and hypomethylated DMCs and increased gene expression, respectively (Supplementary file 6).

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.  

      Addressed on pages: 8-9 and new Supplementary file 4 – We observed an average of 11.05% overlapping DMCs between different pairs of cell types, we did not observe any DMCs that were shared among all four cell types. Indeed, this limited overlap of DMCs among different cell types exposed to BPS was the primary motivation for the analysis described in Figure 2. Thus, instead of focusing solely on direct overlap between specific DMCs, we instead examined similarities among the different cell types tested in the occurrence of epimutations within different annotated genomic regions. To better describe this, we have now added additional text to page 9. We have also added more detail to the legend for Figure 2 on page 8 to more clearly explain the significance of the dot sizes and colors, explaining that the dot sizes are indicative of the relative number of differentially methylated probes that were detected within each specific annotated genomic region, and that the dot colors are indicative of the calculated enrichment score reflecting the relative abundance of epimutations occurring within a specific annotated genomic region. The relative score is calculated by iterating down the list of DMCs and increasing a running-sum statistic when encountering a DMC within the specific annotated genomic region of interest and decreasing the sum when the epimutation is not in that annotated region. The magnitude of the increment depends upon the relative occurrence of DMCs within a specific annotated genomic region.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).  

      Addressed on page: 29 – We have added a new paragraph just before the final paragraph of the Discussion section in which we acknowledge that most of the cell types analyzed during our study were XY-bearing “male” cells and that the manner in which XX-bearing “female” cells might respond to similar exposures could differ from the responses we observed in XY cells. However, we also noted that our assessment of XX-bearing granulosa cells yielded results very similar to those seen in XY Sertoli cells suggesting that, at least for differentiated somatic cell types, there does not appear to be a significant sex-specific difference in response to exposure to a similar dose of the same EDC. That said, we also acknowledged that in cell types in which dosage compensation based on X-chromosome inactivation is not in place, differences between XY- and XX-bearing cells could accrue.

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.  

      Addressed on page: 11 and in a new Supplementary file 8  – Previous reports have indicated that BPS does not have the capacity to bind with the androgen receptor (Pelch et al., 2019; Yang et al., 2024). However there have been reports indicating that BPS can interact with other endocrine receptors including PPARγ and RXRα, which play a role in lipid accumulation and the potential to be linked to obesity phenotypes (Gao et al., 2020; Sharma et al., 2018). To address the reviewer’s comment we assessed the expression of a panel of hormone receptors including PPARγ, RXRα, and AR  in each of the cell types examined in our study and these results are now shown in a new Supplementary file 8. We show that in addition to not expressing either estrogen receptor (ERa or ERb), germ cells also do not express any of the other endocrine receptors we tested including AR, PPARγ, and RXRα. Thus we now note that these results support our suggestion that the induction of epimutations we observed in germ cells in response to exposure to BPS appears to reflect disruption of non-canonical endocrine signaling. We also note that non-canonical endocrine signaling is well established (Brenker et al., 2018; Ozgyin et al., 2015; Song et al., 2011; Thomas and Dong, 2006). Thus we feel the suggestion that the effects of BPS exposure could conceivably reflect either disruption of canonical or non-canonical signaling in any cell type is well justified and that our data suggests that both of these effects appear to have accrued in the cells examined in our study as suggested in the text of our manuscript.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways) are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?  

      Addressed on pages: 19-21 – Per the reviewer’s request, we have added text to indicate that Figure 6a is indeed data from all four cell types examined. We have also modified the text to further clarify that Figure 6c displays the expression of other G-coupled protein receptors which are expressed at similar, if not higher, levels than either ER in all cell types examined, and that these have been shown to have the potential to bind to either 17β-estradiol or BPA in rat models. As alluded to by the reviewer, this is indicative of a wide variety of distinct pathways and/or functions that can potentially be impacted by exposure to an EDC such as BPS. Thus, we have attempted to acknowledge the reviewer’s primary point that BPS may interact with a variety of receptors or other factors involved with a wide variety of different pathways and functions. Importantly, this illustrates the strength of our model system in that it can be used to identify potential impacted target pathways that can then be subsequently pursued further as deemed appropriate.

      (9) In Figure 7, what were the 138 genes? Any commonalities among them? 

      Addressed on page: 22 and in a new Supplementary files 13 and 14 – We have now added a new supplemental Excel file (Supplementary file 13) that lists the 138 overlapping conserved DEGs that did not become reprogrammed/corrected during the transition from iPSCs to PGCLCs. In addition, we have added new text on page 22 and a new Supplementary file 14 which displays KEGG analysis of pathways associated with these 138 retained DEGs. We find that these genes are primarily involved with cell cycle and apoptosis pathways which, interestingly, have the potential to be linked to cancer development which is often linked to disruptions in chromatin architecture.

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      Addressed on page: 6 – We have now significantly reduced the length and scope of the final paragraph of the Introduction per the reviewer’s recommendation.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.  

      Addressed on page: 37 – We have added additional text detailing that all mice used in the project were bred onsite, water was non-autoclaved conventional RO water, and our selection of 5V5R extruded feed for mice used in this study which was highly controlled for the presence of isoflavones and has been certified to be used for estrogen-sensitive animal protocols.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters. 

      Strengths: 

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation. 

      Weaknesses: 

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line. 

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors. 

      Reviewer #2 (Recommendations For The Authors): 

      Overall, this is an intriguing paper but more transparency in the replicates and methods and a more rigorous bioinformatic treatment of the data are required. 

      Specific comments: 

      (1) End of abstract "These results suggest a unique mechanism by which an EDC-induced epimutated state may be propagated transgenerationally following a single exposure to the causative EDC." This is overly speculative for an abstract. There is only epigenetic inheritance following mitosis or differentiation presented in this study. There is no meiosis and therefore no ability to assess multi- or transgenerational inheritance. 

      Addressed on page: 2 – We have modified the text at the end of the abstract to more precisely reflect our intended conclusions based on our data. In our view, the ability of induced epimutations to transcend meiosis per se is not as relevant to the mechanism of transgenerational inheritance as their ability to transcend major waves of epigenetic reprogramming that normally occur during development of the germ line. In this regard the transition from pluripotent iPSCs to germline PGCLCs has been shown to recapitulate at least the first portion of normal germline reprogramming, and now our data provide novel insight into the fate of induced epimutations during this process. Specifically, we show that a prevelance of epimutations was conserved during the iPSC à germ cell transition but that very few (< 5%) of the specific epimutations present in the the BPS-exposed iPSCs were retained when those cells were induced to form PGCLCs. Rather, we observed apparent correction of a large majority of the initially induced epimutations during this transition, but this was accompanied by the apparent de novo generation of novel epimutations in the PGCLCs. We suggest, based on other recent reports in the literature, that this is a result of the BPS exposure inducing changes in the chromatin architecture in the exposed iPSCs such that when the normal germline reprogramming mechanism is imposed on this disrupted chromatin template there is both correction of many existing epimutations and the genesis of many novel epimutations. This observation has the potential to explain the long-standing question of why the prevalence of epimutations persists across multiple generations despite the occurrence of epigenetic reprogramming during each generation. Nevertheless, as noted above, we have modified the text at the end of the abstract to temper this interpretation given that it is still somewhat speculative at this point.

      (2) Doses used in the experiments. One needs to be careful when stating that the dose used is "below FDA's suggested safe environmental level established for BPA" because a different bisphenol is being used here (BPA vs BPS) and the safe level is that which the entire organism experiences. It is likely that cell lines experience a higher effective dose.  

      Addressed on pages: 3, 5, and 26 – We have now made a point of noting that our reference to an EPA-recommended “safe dose” of BPA was for humans and/or intact animals. Changes to this effect have been made in the second and sixth paragraphs of the Introduction section. In addition, we have added text at the end of the fourth paragraph of the Discussion section acknowledging that, as the reviewer suggests, the same dose of an EDC could exert greater effects on cells in a homogeneous culture than on the same cell type within an intact animal given the potential for mitigating metabolic effects in the latter. However, we also note that the ability we demonstrated to quantify the effects of such exposures on the basis of numbers of epimutations (DMCs or DMRs) induced could potentially be used in future studies to study this question by assessing the effects of a specific dose of a specific EDC on a specific cell type when exposed either within a homogeneous culture or within an intact animal.

      (3) Figure 1: In the dose response, what was the overlap in DMCs and DEGs among the 3 doses? Are the responses additive, synergistic, or completely non-overlapping? This is an important point that should be addressed. 

      Addressed on page: 6-7 and in Supplementary files 1-5 – Please see our response to Reviewer 1 critique #4 above where we address similar concerns. While we do find overlap among different cell types with respect to the DMCs, DMRs, and DEGs displayed in Figure 1, we found the effect to be only partially additive as opposed to synergistic in any apparent manner. The fold increase in DMCs, DMRs, and DEGs resulting from exposure to doses of 1 μM or 50 μM ranged from 2.5x to 4.4x, which was well below the 50x increase that would have been expected from a strictly additive effect, and the effect increased even less, if at all, in response to exposure to doses of 50 μM versus 100 μM BPS. Finally, as now noted in the Discussion section on page 25, our conclusion is that these results display a limited dose-dependent effect that was partially additive but also plateaued at the highest doses tested.

      (4) Methods: How many times was each exposure performed on a given cell type? This information should be in the figure legends and methods. In the case of multiple exposures for a given line, do the biological replicates agree? 

      Addressed on pages: 39-45 and in new Supplementary file 15 –  Please see our response to Reviewer 1 critique #2 where we address similar concerns with newly added text and analysis. We now note repeatedly on pages 39-45 that each analysis was conducted on three replicate samples, and we display the similarity among those replicates graphically in a new Supplementary file 15.

      (5) DNA methylation analyses. Very little analysis is presented on the BeadChip array other than hypermethylated/hypomethylated and genomic regions of DMCs. What is the range of methylation changes? Does it vary between hypo vs. hyper DMCs? How many array experiments were performed (biological replicates) and what stats were used to determine the DMCs? Are there DMCs in common among the various cell types? As an example, if more meaningful analysis, one can plot the %5mC over a given array for comparisons between control and treated cell types. For more granularity, the %5mC can be presented according to the element type (enhancers vs promoters). 

      Addressed on pages: 10 and 39-45 and in new Supplementary files 1-5, 15 –  Please see our response to Reviewer 1 critique #2 above where we address similar concerns regarding the number of biological replicates used in this study. DMCs on the Infinium array are identified using mixed linear models. This general supervised learning framework identifies CpG loci at which differential methylation is associated with known control vs. treated co-variates. CpG probes on the array were defined as having differential changes that met both p-value and FDR (≤ 0.05) significant thresholds between treatment and control samples for each cell type analyzed. The range of medians across all samples was 0.0278 to 0.0059 for hypermethylated beta values and -0.0179 to -0.0033 for hypomethylated beta values. As noted above, we did observe an overlap in DMCs between cell types. Thus, we observed an average of 11.05% overlapping DMCs between two or more cell types but we did not observe any DMCs shared between all four cell types. We have added additional text on page 9 and new Supplementary files 1-5 to now more clearly describe that this limited similarity in direct overlap of DMCs was the underlying motivation for the analysis described in Figure 2. Finally, the enrichment dot plots shown in Figure 2 provide the information the reviewer requested regarding the %5mC observed at different annotated genomic element types.

      (6) The investigators correlate the number of DMCs in a given cell type with the presence of estrogen receptors. Does the correlation extend to the methylation difference (delta beta) at the statistically different probes?

      Addressed in a new Supplementary file 7 – We have added a new Supplementary file 7 in which we provide data addressing this question. In brief, we find that the delta betas of probes enriched at enhancer regions and associated with relative proximity to ERE elements in Sertoli cells, granulosa cells, and iPSCs appear very similar to those associated with DMCs not located within these enriched regions. However, when we compared the similarity of the two data sets with goodness of fit tests, we found these relatively small differences were, in fact, statistically significant based on a two-sample Kolmogorov-Smirnov test. These observed significant differences appear to indicate that there is higher variability among the delta betas associated with hypomethylated, but not hypermethylation changes occurring at DMCs associated with enhancers, potentially suggesting a greater tendency for exposure to BPS to induce hypomethylation rather than hypermethylation changes, at least in these specific regions.

      (7) Methylation changes relative to EREs are presented in multiple figures. Are other sequences enriched in the DMCs? 

      Addressed in a new Supplementary file 11. We profiled the genomic sequence within 500 bp of cell type-specific enriched DMCs that were either associated with enhancer regions in Sertoli, granulosa, or iPS cells or transcription factor binding sites in PGCLCs for the identification of higher abundance motif sequences. We then compared any motifs identified with the JASPAR database to potentially find transcription factors that could be binding to these regions. Interestingly we found that the two most common motifs across all cell types were associated with either the chromatin remodeling transcription factor HMG1A or the pluripotency factor KLF4.

      (8) Please present a correlation plot between the methylation differences and the adjacent DEGs. Again, the absence of consideration of the absolute changes in methylation and gene expression minimizes the impact of the data. 

      Addressed on pages 6, 7, and 17 and in a new Supplementary file 6 – We analyzed the relationship between DMCs at DEGs promoter regions and the corresponding change in expression of that DEG. Our data support a relationship between up-regulated genes showing decreased methylation in promoter regions and down-regulated genes showing increased methylation at promoter regions, although there were some exceptions to this relationship.

      (9) EM-Seq is mentioned in Figure 7 and in the material and methods. Where is it used in this study? 

      Addressed on page 22 – We now note in the text on page 22 that EM-seq was used during experiments assessing the propagation of BPS-induced epimutations during the iPSC à EpiLC à PGCLC cell state transitions to gather higher resolution data of changes to DNA methylation differences at the whole-epigenome level.

      References

      Brenker C, Rehfeld A, Schiffer C, Kierzek M, Kaupp UB, Skakkebæk NE, Strünker T. 2018. Synergistic activation of CatSper Ca2+ channels in human sperm by oviductal ligands and endocrine disrupting chemicals. Hum Reprod 33:1915–1923. doi:10.1093/humrep/dey275

      Gao P, Wang L, Yang N, Wen J, Zhao M, Su G, Zhang J, Weng D. 2020. Peroxisome proliferator-activated receptor gamma (PPARγ) activation and metabolism disturbance induced by bisphenol A and its replacement analog bisphenol S using in vitro macrophages and in vivo mouse models. Environ Int 134. doi:10.1016/J.ENVINT.2019.105328

      Ozgyin L, Erdos E, Bojcsuk D, Balint BL. 2015. Nuclear receptors in transgenerational epigenetic inheritance. Prog Biophys Mol Biol. doi:10.1016/j.pbiomolbio.2015.02.012

      Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. 2019. Characterization of Estrogenic and Androgenic Activities for Bisphenol A-like Chemicals (BPs): In Vitro Estrogen and Androgen Receptors Transcriptional Activation, Gene Regulation, and Binding Profiles. Toxicol Sci 172:23–37. doi:10.1093/TOXSCI/KFZ173

      Sharma S, Ahmad S, Khan MF, Parvez S, Raisuddin S. 2018. In silico molecular interaction of bisphenol analogues with human nuclear receptors reveals their stronger affinity vs. classical bisphenol A. Toxicol Mech Methods 28:660–669. doi:10.1080/15376516.2018.1491663

      Song K-H, Lee K, Choi H-S. 2011. Endocrine Disrupter Bisphenol A Induces Orphan Nuclear Receptor Nur77 Gene Expression and Steroidogenesis in Mouse Testicular Leydig Cells. Endocrinology 143:2208–2215. doi:10.1210/endo.143.6.8847

      Thomas P, Dong J. 2006. Binding and activation of the seven-transmembrane estrogen receptor GPR30 by environmental estrogens: A potential novel mechanism of endocrine disruption. J Steroid Biochem Mol Biol 102:175–179. doi:10.1016/j.jsbmb.2006.09.017

      Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. 2024. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. Environ Sci Technol 58:2817–2829. doi:10.1021/ACS.EST.3C09779/ASSET/IMAGES/LARGE/ES3C09779_0004.JPEG

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.  

      Strengths:  

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.  

      Weaknesses:  

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.  

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons).

      Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.  

      We appreciate the reviewer's detailed feedback regarding the analysis of VTA axons in our dataset. The relatively low sample size for VTA axons is due to their sparsity in the dCA1 region of the hippocampus and the inherent difficulty in recording from these axons. VTA axons are challenging to capture due to their low baseline fluorescence and long-range axon segments, resulting in a typical yield of only a single axon per field of view (FOV) per animal. In contrast, LC axons are more abundant in dCA1.

      To address the disparity in sample sizes between LC and VTA axons, we down-sampled the LC axons to match the number of VTA axons, repeating this process 1000 times to create a distribution. However, we acknowledge the reviewer's concern that the relatively low sample size for VTA axons might result in insufficient sampling of this population. Increasing the baseline expression of GCaMP to record from VTA axons requires several months, limiting our ability to quickly expand the sample size.

      In response to the reviewer's comments, we have added recordings from 2 additional VTA axons, increasing the sample size from 7 to 9. We re-analyzed all data from the familiar environment with n=9 VTA axons, comparing them to down-sampled LC axons as previously described. However, the additional axons were not recorded in the novel environment. We agree with the reviewer that the lack of novelty-induced DA axon activity may be a false negative. To address this, we have revised the description of our results to include the following sentence:

      “However, 1 VTA ROI showed an increase in activity immediately following exposure to novelty, indicating heterogeneity across VTA axons in CA1, and the lack of a novelty signal on average may be due to a small sample size.”

      Regarding the use of two different GCaMP constructs, we understand the reviewer's concern. We used GCaMP6s and GCaMP7b variants to determine if one would improve the success rate of recording from VTA axons. Given the long duration of these experiments and the low yield, we pooled the data from both GCaMP variants to increase statistical power. However, we recognize the importance of verifying that there are no differences in the signals recorded with these variants.

      With the addition of 2 VTA DA axons expressing GCaMP6s, we now have n=5 GCaMP6s and n=4 GCaMP7b VTA DA axons. This allowed us to compare the activity of the two sensors in the familiar environment. As shown in new Supplementary Figure 2, both sets of axons responded similarly to the variables measured: position in VR, time to motion onset, and animal velocity (although the GCaMP6s expressing axons showed stronger correlations). Since all LC axons recorded expressed GCaMP6s, we also specifically compared VTA GCaMP6s axons to LC GCaMP6s axons (Supp Fig. 3). Our conclusions remained consistent when comparing this subset of VTA axons to LC axons.

      Overall, our paper now includes comparisons of combined VTA axons (n=9) and separately the GCaMP6s-expressing VTA axons (n=5) with LC axons. Both datasets support our initial conclusions that VTA axons signal proximity to reward, while LC axons encode velocity and motion initiation in familiar environments.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.  

      We appreciate the reviewer's insightful suggestion to analyze LC axon activity during periods of immobility to distinguish between changes induced by novelty and those induced by motion. This additional analysis would indeed strengthen our conclusions regarding the LC novelty signal.

      In response to this suggestion, we performed the same analysis as before, but focused on periods of immobility. Our findings indicate that following exposure to novelty, there was a significant increase in LC activity specifically during immobility. This supports the idea that LC axons produce a novelty signal that is independent of novelty-induced behavioral changes. The results of this analysis are now presented in new Supplementary Figure 5b

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.  

      This is an insightful comment regarding the potential role of VTA DA axons in signaling temporal information. We agree that VTA DA axons could indeed be encoding temporal information, as previous work from our lab has shown that these axons exhibit ramping activity when averaged by time to reward (Krishnan et al., 2022).

      To address this, we have now examined DA axon activity relative to time to reward, as shown in new Supplementary Figure 4. Our analysis confirms that these axons ramp up in activity relative to time to reward. Given the periodicity of our mice's behavior in these experiments, as the reviewer correctly points out, we are unable to distinguish between spatial proximity to reward and time to reward. We have added a sentence to our paper highlighting this limitation and stating that further experiments are necessary to differentiate these two variables.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.  

      We appreciate the reviewer's insightful comment regarding the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments. The choice of a 3m track for LC axon recordings was made to align with a previous experiment from our lab (Dong et al., 2021), in which mice were exposed to a novel 3m track while CA1 pyramidal cell populations were recorded. In that study, we detailed the time course of place field formation within the novel track. Our current hypothesis is that LC axons signal novelty, and we aimed to investigate whether the time course of LC axon activity aligns with the time course of place field formation. This hypothesis, and the potential role of LC axons in facilitating plasticity for new place field formation, is further discussed in the Discussion section of our paper.

      For the VTA axon recordings, we utilized a 2m track, consistent with another recent study from our lab (Krishnan et al., 2022), where reward expectation was manipulated, and CA1 pyramidal cell populations were recorded. By matching the track length to this prior study, we aimed to explore how VTA dopaminergic inputs to CA1 might influence CA1 population dynamics along the track under conditions of varying reward expectations.

      We acknowledge that using different track lengths for LC and VTA recordings introduces a variable that could potentially confound direct comparisons. To address this, we normalized the track lengths for our LC versus VTA comparison analysis. This normalization allowed us to directly compare patterns of activity across the two types of axons by adjusting the data to a common scale, thereby ensuring that any observed differences or similarities are attributable to the intrinsic properties of the axons rather than differences in track lengths. By doing so, we could assess relative changes in activity levels at matched spatial bins.

      Although the experiences of the animals on the different track lengths are not identical, our observations suggest that LC and VTA axon signals are not majorly influenced by variations in track length. LC axons are associated with velocity and a pre-motion initiation signal, neither of which are affected by track length. VTA axons, which also correlate with velocity, can be compared to LC axon velocity signals because mice reach maximal velocity very quickly a long the track, well before the end of the 2m track. The range of velocities are therefore capture on both track lengths. While VTA axons exhibit ramping activity as they approach the reward zone—a signal potentially modulated by track length—LC axons do not show such ramping to reward signals. Thus, a comparison across different track lengths is justified for this aspect of our analysis.

      To further enhance the rigor of our comparisons between axon dynamics recorded on 2m and 3m tracks, we conducted an additional analysis plotting axon activity by time to reward and actual (un-normalized) distance from reward (Supplementary Figure 4). This analysis revealed very similar signals between the two sets of axons, supporting our initial conclusions.

      We thank the reviewer for raising this important point and hope that our detailed explanation and additional analysis address their concern.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):  

      Summary:  

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.  

      The main findings were as follows:  

      - In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.  

      - VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.  

      - In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.  

      - In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.  

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.  

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.  

      Strengths:  

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.  

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.  

      Weaknesses:  

      (1) The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.  

      (2) Some aspects of the methodology would benefit from clarification.  

      First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have addressed each one (and point out any new additions):

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2) Sample size - we state both the number of ROIs and mice for each analysis. We have now also added the number of mice we observed specific types of activity in. 

      (3) Inclusion/exclusion criteria - The following has now been added to the Methods section: Out of the 36 NET-Cre mice injected, 15 were never recorded from for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, imaging was never conducted in 36 of them for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 10 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis. 

      (4) Randomization - Already included in the paper is a description of random downsampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution. 

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version. 

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and axon signaling. 

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in great detail in this version of the paper. 

      (8) Experimental animals - We have now described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017). This was absent in the initial version of the paper.

      (9) Experimental procedure - Already listed in great detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?  

      We thank the reviewer for pointing this out and giving us a chance to address it directly. A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.  

      This is a typo leftover from before we converted velocity from rotational units of the treadmill to cm/s. This has now been corrected.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.  

      We appreciate the reviewer's insightful comment regarding the potential impact of decreased velocity on novelty responses in LC and VTA axons. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons and could mask a subtle novelty signal in VTA axons. We have now included the following points in our discussion:

      “In addition, as noted above, on average we did observe a velocity associated signal in VTA axons. When mice were exposed to the novel environment their velocity initially decreased. This would be expected to reduce the average signal across the VTA axon population relative to the higher velocity in the familiar environment. It is possible that this decrease could somewhat mask a subtle novelty induced signal in VTA axons. Therefore, additional experiments should be conducted to investigate the heterogeneity of these axons and their activity under different experimental conditions during tightly controlled behavior.”

      “As discussed above, the slowing down of animal behavior in the novel environment could have decreased LC axon activity and reduced the magnitude of the novelty signal we detected during running. The novelty signal we report here may therefore be an under estimate of it's magnitude under matched behavioral settings.”

      However, it is important to note that although VTA axons, on average, showed activity modulated by velocity in a familiar rewarded environment, this relationship was largely due to the activity of two VTA axons that were strongly modulated by velocity, indicating heterogeneity within the VTA axon population in dCA1. We have highlighted this point in the discussion. We also discuss that:

      “It is possible that some VTA DA inputs to dCA1 respond to novel environments, and the small number of axons recorded here are not representative of the whole population.”

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.  

      Mice receive their water reward through a water spout that is immobile and positioned directly in front of their mouth. Water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the water spout is immobile and the water reward is not delivered until they reach the end of the track, there is nothing for the mice to detect during their run. We have added clarifications about the water spout to the Methods and Results sections, along with appropriate discussion points.

      Additionally, we note that the ramping activity of VTA axons is still present on the initial laps with no reward (Krishnan et al., 2022), indicating that this activity is not directly related to the presence or absence of water but is instead associated with the animal’s reward expectation.

      We thank the reviewer for raising this point and hope that these clarifications address their concern.

      Reviewer #3 (Public Review):  

      Summary:  

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.  

      Weaknesses:  

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?  

      In response to the comment regarding the freezing ratios during teleportation between familiar and novel environments, we have analyzed the freezing ratios and lap velocities of DAT-Cre and NET-Cre mice separately (Fig. 3Aiii). Our analysis shows that the mean lap velocities of both groups overlap in the familiar environment and significantly decrease on the first lap of the novel environment (Fig. 3iii, top). For subsequent laps, the velocities in both groups are not statistically significantly different from the familiar environment lap velocities.

      Freezing ratios also show a statistically significant decrease on the first lap of the novel environment compared to the familiar environment in both groups (Fig. 3iii, bottom). In the NETCRE mice, the freezing ratios remain statistically lower in subsequent laps, while in the DATCRE mice, the following laps show a similar trend but without statistical significance. This lack of statistical significance in the DAT-CRE mice is likely due to their already lower freezing ratios in the familiar environment. Overall, the data demonstrate similar behavioral responses in the two groups of mice during the switch from the familiar to the novel environment.

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point. To address this issue, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice by only selecting one LC axon from each mouse to match the VTA dataset.

      Additionally, we have now included the number of recording sessions and the number of mice in which we observed each type of activity. This information has been added to further clarify and support our conclusions.

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?  

      As above, we have now added the number of mice that had each activity type we report in the paper here.  

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?  

      While we did not vary the presentation order of familiar and novel environments, we recorded the activity of LC axons in some mice when exposed to a dark environment (no VR cues) prior to exposure to the familiar environment. Our analysis of this data demonstrates that LC axons are also active following abrupt exposure to the familiar environment.

      We have added a new figure showing this response (Supplementary Figure 5A) and expanded on our original discussion point that LC axon activity generally correlates with arousal, as this result also supports that interpretation.

      We thank the reviewer for highlighting this important consideration. It certainly helps with the interpretation regarding what LC axons generally encode.  

      >Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      In addition to what has been described in the public review, I have the following recommendations:  

      The sample size of DA axon recordings should be increased with the use of a single GCaMP for valid conclusions to be made about the lack of novelty-inducted activity in these axons.  

      We have increased the n of VTA GCaMP6s axons in the familiar environment by including two axons that were recorded in the familiar rewarded condition. We have also conducted an analysis comparing GCaMPs versus GCaMP7b, which is discussed in detail above.

      Regarding the concerns about valid conclusions of novelty-induced activity in VTA axons, we have added a comment in the discussion to tone down our conclusions regarding the lack of a novelty signal in the VTA axons. This valid concern is discussed in detail above.  

      The title is currently very generic, and non-informative. I recommend the use of more specific language in describing the type of behavior under investigation. It is not clear to the reviewer why 'learning' is included here.  

      Original title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during behavior and learning”

      To make it more specific to the experiments conducted here, we have changed the title to this:

      New title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during navigation in familiar and novel environments”

      Error noted in Figure 4C legend - remove reference to VTA ROIs.  

      The reference to VTA ROIs has been removed from the figure legend

      Reviewer #2 (Recommendations For The Authors):  

      (1) The concluding sentence of the Abstract could be more specific: which distinct types of information are reflected/'signaled'/'encoded' by LC and VTA inputs to dorsal CA1?  

      The abstract has been adjusted accordingly. The new sentence is more specific: “These inputs encode unique information, with reward information in VTA inputs and novelty and kinematic information in LC inputs, likely contributing to differential modulation of hippocampal activity during behavior and learning.”

      (2) Line 46/47: The study by Mamad et al. (2017) did not quite show that VTA dopamine input to dorsal CA1 'drives place preference'. To my understanding, the study showed that suppression of VTA dopamine signaling in a specific place caused avoidance of this place and that VTA dopamine signaling modulated hippocampal place-related firing. So, please consider rephrasing.  

      Corrected, thanks for pointing this out.

      (3) Legend to Figure 3AIII: 'Each lap was compared to the first lap in F . . .' Could you clarify if 'F' refers to the 'familiar environment?  

      Figure legend has been changed accordingly

      (4) Line 176: '36 LC neurons' - should this not be '36 imaged axon terminals in dorsal CA1' or something along these lines?  

      This reference has been changed to “LC axon ROIs”

      (5) Line 353: Why was water restriction started before the hippocampal window implant, if behavioral training to run for water reward only started after the implant? Please clarify.

      A sentence was added to the methods to explain that this was done to reduce bleeding and swelling during the hippocampal window implantation.  

      (6) Line 377: '. . . which took 10-14 days (although some mice never reached this threshold).' How many mice did not reach the criterion within 14 days? I think it is not accurate to say the mice 'never' reached the threshold, as they were only tested for a limited period of time.  

      We have added details of how many mice were excluded from each group and the reason why they were excluded.

      (7) Exclusion criteria for imaging data: The authors state (from line 402): 'Imaging sessions with large amounts of drift or bleaching were excluded from analysis (8 sessions for NET mice, 6 sessions for LC Mice).' What exactly were the quantitative exclusion criteria? Were these defined before the onset of the study or throughout the study?  

      Imaging sessions were first qualitatively assessed by looking for disappearance or movement of structures in the Z-plane throughout the imaging FOV. Additionally, following motion correction in suite2p, we used the registration metrics, which plots the first Principle Component of the motion corrected images, to assess for drift, bleaching, or heat bubbles. If this variable increased or decreased greatly throughout a session, to the point where any apparent activity was not visible in the first PC, the dataset was excluded. We have added these exclusion criteria to the methods section.

      Reviewer #3 (Recommendations For The Authors):  

      Please provide a justification or rationale for having two different criteria for immobility (< 5cm/sec) and freezing (<0.2 cm/sec). If VTA and LC axon activities are different between these two velocities, please provide some commentary on this difference.  

      This is a typo leftover from before we converted velocity from rotational units to cm/s.

    2. eLife assessment

      This manuscript provides important results that assessed the contribution of two catecholaminergic projections to the hippocampus during environment-guided reward behavior. The authors use 2-photon imaging in the hippocampus of behaving mice to provide solid evidence that there are dissociable roles of dopamine and norepinephrine in this structure. Although of great interest to the field of learning and memory, the results would be strengthened by additional data collected from dopaminergic projections to the hippocampus.

    3. Reviewer #1 (Public Review):

      Summary:

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head restrained mice running down a virtual linear path. Mice were trained to collect water reward at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.

      The revised manuscript included additional evidence of increased (but transient) signal in LC axons after a transition to a novel environment during periods of immobility, and also that a change from dark to familiar environment induces a peak in LC axon activity, showing that LC input to dCA1 may not solely signal novelty.

      Strengths:

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis at the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.

      The authors have addressed my concerns in a thorough manner. The reviewer also appreciates the increased transparency of reporting in the revised manuscript.

      Weaknesses:

      Listed below are some remaining comments.<br /> The increase in LC activity with any change in environment (from familiar to novel or from dark to familiar) suggests that LC input acts not solely as a novelty signal, but as a general arousal or salience signal in response to environmental changes. Based on this, I have a couple of questions:

      • Is the overall claim that LC input to the dHC signals novelty still valid based on observed findings - as claimed throughout the manuscript?<br /> • Would the omission of a reward be considered a salient change in the environment that activates LC signals, or is the LC not involved with processing reward-related information? Has the activity of LC and VTA axons been analysed in the seconds following reward presentation and/or omission?

    4. Reviewer #2 (Public Review):

      Summary:

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.

      The main findings were as follows:<br /> - In a familiar environment, activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.<br /> - VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.<br /> - In contrast, activity of LC axons ramped up before initiation of movement on the Styrofoam wheel.<br /> - In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.

      Strengths:

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.

      Weaknesses:

      (1) The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.

      (2) Some aspects of the methodology would benefit from clarification.<br /> First, to help others to better scrutinize, evaluate and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data (see below, Recommendations for the authors).<br /> Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?<br /> Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Fig. 3a, but as <0.2 cm/s for the imaging data analysis in Fig. 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Fig. 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Fig. 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Fig. 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the novelty-induced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.

      REVIEW OF THE REVISED MANUSCRIPT<br /> I thank the authors for their responses addressing some of the weaknesses I raised in my original comments.

      Regarding their clarification of some methodological issues [Point 2) above], I have a few additional comments:<br /> - I appreciate that the authors clearly state the sample sizes contributing to the data. However, sample size justifications (e.g. based on previous studies, considerations of statistical power, practical considerations or a combination of these factors) are still lacking.<br /> - It is good that the authors have now clearly indicated how many mice they excluded due to lack of GCaMP expression or due to failure to reach the behavioral criteria. They also indicated that they discarded some of the collected datasets, based on the visual assessment of imaging sessions and the registration metrics output by suite2p. I appreciate that this may be common practice (although I am not using 2-photon imaging myself). However, I note that to minimize the risk of experimenter bias and improve reproducibility, it would be preferable to have more clearly defined quantitative criteria for such exclusions.<br /> - The authors clarified in their response why they used two different linear tracks for their studies of VTA and LC axon activity. I would encourage them to include this clarification in the manuscript. From the authors' response, I understand that they chose the different track lengths to facilitate comparison to previous studies involving LC and VTA axon recordings. However, given that the present paper aimed to compare LC and VTA axon recordings, the use of different track lengths for LC and VTA axon recordings remains a limitation of the present paper.

    5. Reviewer #3 (Public Review):

      Summary:

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength to their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches to accommodate their unequal LC-CA1 and VTA-CA1 sample sizes. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of navigation and memory.

      Weaknesses:

      The conclusions of this manuscript are mostly well supported by the data. However, increasing the sample size of the VTA-CA1 group and using experimental methods that are identical among LC-CA1 and VTA-CA1 groups would help to fully support the author's conclusions.

    1. eLife assessment

      This study presents valuable framework and findings to our understanding of the brain as a fractal object by observing the stability of its shape property within 11 primate species and by highlighting an application to the effects of aging on the human brain. The evidence provided is solid but the link between brain shape and the underlying anatomy remains unclear. This study will be of interest to neuroscientists interested in brain morphology, whether from an evolutionary, fundamental or pathological point of view, and to physicists and mathematicians interested in modeling the shapes of complex objects.

    2. Reviewer #2 (Public Review):

      In this manuscript, the authors analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between brains of young (~20 year old) and old (~80) humans. A challenge the authors acknowledge struggling with in reviewing the manuscript is merging "complex mathematical concepts and a perplexing biological phenomenon." This reviewer remains a bit skeptical about whether the complexity of the mathematical concepts being drawn from are justified by the advances made in our ability to infer new things about the shape of the cerebral cortex.

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1 produces image segmentations that do not resemble real brains. The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel. The reason for introducing this bias (and to the extent that it is present in the authors' implementation) is not provided. The authors provide an intuitive explanation of why thickness relates to folding characteristics, but ultimately an issue for this reviewer is, e.g., for the right-most panel in Figure 2b, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of typical mammalian neocortex.

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4b. It seems this additional spacing between gyri in 80 year olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20 year olds. The authors assert that K provides a more sensitive measure (associated with a large effect size) than currently used ones for distinguishing brains of young vs. old people. A more explicit, or elaborate, interpretation of the numbers produced in this manuscript, in terms of brain shape, might make this analysis more appealing to researchers in the aging field.

      (3) In the Discussion, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. Given the lack of association between the abstract mathematical parameters described in this study and explicit properties of brain tissue and its constituents, it is difficult to envision how the coarse-graining operation can be used to guide development of "models of cortical gyrification."

      (4) There are several who advocate for analyzing cortical mid-thickness surfaces, as the pial surface over-represents gyral tips compared to the bottoms of sulci in the surface area. The authors indicate that analyses of mid-thickness representations will be taken on in future work, but this seems to be a relevant control for accepting the conclusions of this manuscript.

    3. Reviewer #3 (Public Review):

      Summary: Through a rigorous methodology, the authors demonstrated that within 11 different primates, the shape of the brain followed a universal scaling law with fractal properties. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependant effect of aging on the human brain.

      Strengths:<br /> - New hierarchical way of expressing cortical shapes at different scales derived from previous report through implementation of a coarse-graining procedure<br /> - Investigation of 11 primate brains and contextualisation with other mammals based on prior literature<br /> - Proposition of tool to analyse cortical morphology requiring no fine tuning and computationally achievable<br /> - Positioning of results in comparison to previous works reinforcing the validity of the observation.<br /> - Illustration of scale-dependance of effects of brain aging in the human.

      Weaknesses:<br /> - The notion of cortical shape, while being central to the article, is not really defined, leaving some interpretation to the reader<br /> - The organization of the manuscript is unconventional, leading to mixed contents in different sections (sections mixing introduction and method, methods and results, results and discussion...). As a result, the reader discovers the content of the article along the way, it is not obvious at what stages the methods are introduced, and the results are sometimes presented and argued in the same section, hindering objectivity.<br /> To improve the document, I would suggest a modification and restructuring of the article such that: 1) by the end of the introduction the reader understands clearly what question is addressed and the value it holds for the community, 2) by the end of the methods the reader understands clearly all the tools that will be used to answer that question (not just the new method), 3) by the end of the results the reader holds the objective results obtained by applying these tools on the available data (without subjective interpretations and justifications), and 4) by the end of the discussion the reader understands the interpretation and contextualisation of the study, and clearly grasps the potential of the method depicted for the better understanding of brain folding mechanisms and properties.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      This study presents valuable framework and findings to our understanding of the brain as a fractal object by observing the stability of its shape property within 11 primate species and by highlighting an application to the effects of aging on the human brain. The evidence provided is solid but the link between brain shape and the underlying anatomy remains unclear. This study will be of interest to neuroscientists interested in brain morphology, whether from an evolutionary, fundamental or pathological point of view, and to physicists and mathematicians interested in modeling the shapes of complex objects.

      We now clarified the outstanding questions regarding if our model outputs can be related to actual primate brain anatomy, which we believe was mainly based on comments regarding the validity of our output of apparently thicker cortices than nature can produce.

      We address this point in more detail in the point-by-point response below, but want to address this misunderstanding directly here: Our algorithm does not produce thicker cortices with increasing coarse-graining scales; in fact, the cortical thickness never exceeds the actual cortical thickness in our outputs, but rather thins with each coarse-graining scale. In other words, we believe that our outputs are fully in line with neuroanatomy across species.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between brains of young (~20 year old) and old (~80) humans. A challenge the authors acknowledge struggling with in reviewing the manuscript is merging "complex mathematical concepts and a perplexing biological phenomenon." This reviewer remains a bit skeptical about whether the complexity of the mathematical concepts being drawn from are justified by the advances made in our ability to infer new things about the shape of the cerebral cortex. 

      To allow scientists from all backgrounds to adopt these complex ideas, we have made our code to “melt” the brains and for further downstream analysis publicly available. We have now also provided a graphical user interface, to allow users without substantial coding experience to run the analysis. We also believe that the algorithmic concepts are easy to understand due to the similarity to the coarse-graining procedures found in long-standing and well-accepted box-counting algorithms.

      Beyond the theoretical insight of the fractal nature of cortices and providing an explicit and crucial link between vastly different brains that are gyrified and those that are not, we believe that the advance gained by our methods for future applications is clearly demonstrated in our proof-of-principle with a four-fold increase in effect size. For reference, an effect size of 8 would translate to an almost perfect separation of groups, i.e. an ideal biomarker with near 100% sensitivity and specificity.

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1 produces image segmentations that do not resemble real brains.

      As re-iterated in our Methods and Discussion: “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and are not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Fig. 1 therefore serves as an explanation to the reader on the algorithmic outputs, but each melted brain is not supposed to be directly/visually compared to actual brains. Similar to algorithms measuring the fractal dimension, or the exposed surface area of a given brain, the intermediate outputs of these algorithms are not supposed to represent any biologically observed brain structures, but rather serve as an abstraction to obtain meaningful morphometrics.

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained and voxelised versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects/voxelisations themselves.

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel. The reason for introducing this bias (and to the extent that it is present in the authors' implementation) is not provided.

      This detail was in the Supplementary, and we have now added additional clarification on this specific point to our Supplementary:

      “In detail, we assign all voxels in the grid with at least four corners inside the original pial surface to the pial voxelization. This process allows the exposed surface to remain approximately constant with increasing voxel sizes. A constant exposed surface is desirable, as we only want to gradually ‘melt’ and fuse the gyri, but not grow the bounding/exposed surface as well. We want the extrinsic area to remain approximately constant as we decrease the intrinsic area via coarse-graining; it is like generating iterates of a Koch curve in reverse, from more to less detailed, by increasing the length of smallest line segment.

      We then assign voxels with all eight corners inside the original white matter surface to the white matter voxelization. This is to ensure integrity of the white matter, as otherwise white matter voxels in gyri may become detached from the core white matter, and thus artificially increase white matter surface area. Indeed, the main results of the paper are not very sensitive to this decision using all eight corners, vs. e.g. only four corners, as we do not directly use white matter surface area for the scaling law measurements. However, we still maintained this choice in case future work wants to make use of the white matter voxelisations or derivative measures.”

      Note on the point of white matter integrity that if both grey and white matter voxelisations require all 8 corner to be inside the respective mesh, there will be voxels not assigned to either at the grey/white matter interface, causing potential downstream issues.

      We further acknowledge:

      “Of course, our proposed procedure is not the only conceivable way to erase shape details below a given scale; and we are actively working on related algorithms that are also computationally cheaper. Nevertheless, the current version requires no fine-tuning, is computationally feasible and conceptually simple, thus making it a natural choice for introducing the methodology and approach.”

      The authors provide an intuitive explanation of why thickness relates to folding characteristics, but ultimately an issue for this reviewer is, e.g., for the right-most panel in Figure 2b, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of typical mammalian neocortex. 

      We assume the reviewer refers to Fig. 1B with the panel on scale=4.9mm. We would like to point out that Fig. 1 serves as an explanation of the voxelisation method. For the actual analysis and Results, we are using re-scaled brains (see Fig. 2 with the ever decreasing brain sizes). The rescaling procedure is now expanded as below:

      “Morphological properties, such as cortical thicknesses measured in our ‘melted’ brains are to be understood as a thickness relative to the size of the brain. Therefore, to analyse the scaling behaviour of the different coarse-grained realisations of the same brain, we apply an isometric rescaling process that leaves all dimensionless shape properties unaffected (more details in Suppl. S3.1). Conceptually, this process fixes the voxel size, and instead resizes the surfaces relative to the voxel size, which ensures that we can compare the coarse-grained realisations to the original cortices, and test if the former, like the latter, also scale according to Eqn. (1). Resizing, or more precisely, shrinking the cortical surface is mathematically equivalent to increasing the box size in our coarse-graining method. Both achieved an erasure of folding details below a certain threshold. After rescaling, as an example, the cortical thickness also shrinks with increasing levels of coarse-graining, and never exceeds the thickness measured at native scale.”

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects themselves and their detailed anatomical features.

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4b. It seems this additional spacing between gyri in 80 year olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20 year olds. The authors assert that K provides a more sensitive measure (associated with a large effect size) than currently used ones for distinguishing brains of young vs. old people. A more explicit, or elaborate, interpretation of the numbers produced in this manuscript, in terms of brain shape, might make this analysis more appealing to researchers in the aging field.

      We have removed the main results relating to K and aging from our last revision already to avoid confusion. This is now only in the supplementary analysis, and our claim of K being a more sensitive measure for age and ageing – whilst still true – will be presented in more detail in a series of upcoming papers.

      (3) In the Discussion, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. Given the lack of association between the abstract mathematical parameters described in this study and explicit properties of brain tissue and its constituents, it is difficult to envision how the coarse-graining operation can be used to guide development of "models of cortical gyrification."

      We have clarified in more detail what we meant originally in Discussion:

      “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface as an output could be directly coarse-grained with our method and the morphological trajectories can be compared with those of actual human and primate cortices. The simulated cortices would only be ‘valid’ in terms of the dual universality, if it also produces the same morphological trajectories.”

      However, we agree with the reviewer that our paper could be misread as demanding direct comparisons of each coarse-grained brain with an actual brain, and we have now added the following text to clarify that this is not our intention for the proposed method or outputs.

      “Note, we do not suggest to directly compare coarse-grained brain surfaces with actual biological brain surfaces. As we noted earlier, the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Indeed, the dual universality imposes restrictive constraints on the possible shapes of real cortices, but do not fully specify them. Presumably, the location of individual folds in different individuals and species will depend on their respective evolutionary histories, so there is no reason to expect a match in fold location between the ‘melted’ cortices of more gyrified species, on one hand, and the cortex of a less-gyrified one, on the other,  even if their global morphological parameters and global mechanism of folding coincide.

      (4) There are several who advocate for analyzing cortical mid-thickness surfaces, as the pial surface over-represents gyral tips compared to the bottoms of sulci in the surface area. The authors indicate that analyses of mid-thickness representations will be taken on in future work, but this seems to be a relevant control for accepting the conclusions of this manuscript.

      In the context of some applications and methods, we agree that the mid-surface is a meaningful surface to analyse. However, in our work, the mid-surface is not. The fractal estimation rests on the assumption that the exposed area hugs the object of interest (hence convex hull of the pial surface), as the relationship between the extrinsic and intrinsic areas across scales determine the fractal relationship (Eq. 2). If we used the mid-surface instead of the pial surface for all estimation, this would not represent the actual object of interest, and it is separated from the convex hull. Estimating a new convex hull based on the mid surface would be the equivalent of asking for the fractal dimension of the mid-surface, not of the cortical ribbon. In other words, it would be a different question, bound to yield a different answer.

      Hence, we indicated in our original response that we only have a provisional answer, but more work beyond the scope of this paper is required to answer this question, as it is a separate question. The mid-surface, as a morphological structure in its own right, will have its own scaling properties, and our provisional understanding is that these also yield a scaling law parallel to those of the cortical ribbon with the same or a similar fractal dimension. But more systematic work is required to investigate this question at native scale and across scales.

      Reviewer #3 (Public Review):

      Summary: Through a rigorous methodology, the authors demonstrated that within 11 different primates, the shape of the brain followed a universal scaling law with fractal properties. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependant effect of aging on the human brain. 

      Strengths: 

      - New hierarchical way of expressing cortical shapes at different scales derived from previous report through implementation of a coarse-graining procedure 

      - Investigation of 11 primate brains and contextualisation with other mammals based on prior literature 

      - Proposition of tool to analyse cortical morphology requiring no fine tuning and computationally achievable 

      - Positioning of results in comparison to previous works reinforcing the validity of the observation. 

      - Illustration of scale-dependance of effects of brain aging in the human. 

      Weaknesses: 

      - The notion of cortical shape, while being central to the article, is not really defined, leaving some interpretation to the reader 

      - The organization of the manuscript is unconventional, leading to mixed contents in different sections (sections mixing introduction and method, methods and results, results and discussion...). As a result, the reader discovers the content of the article along the way, it is not obvious at what stages the methods are introduced, and the results are sometimes presented and argued in the same section, hindering objectivity. 

      To improve the document, I would suggest a modification and restructuring of the article such that: 1) by the end of the introduction the reader understands clearly what question is addressed and the value it holds for the community, 2) by the end of the methods the reader understands clearly all the tools that will be used to answer that question (not just the new method), 3) by the end of the results the reader holds the objective results obtained by applying these tools on the available data (without subjective interpretations and justifications), and 4) by the end of the discussion the reader understands the interpretation and contextualisation of the study, and clearly grasps the potential of the method depicted for the better understanding of brain folding mechanisms and properties. 

      We thank this reviewer again for their attention to detail and constructive comments. We have followed the detailed suggestions provided by us in the Recommendations For The Authors, and summarise the main changes here:

      - We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsections, we believe the structure is now more accessible to readers.

      -  We have now clarified the concept of “cortical shape”, as we use it in our paper in several places, by distinguishing clearly the object of study, and the morphological properties measured from it.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): None 

      Reviewer #3 (Recommendations For The Authors): 

      I once again compliment the authors for their elegant work. I am happy with the way they covered my first feedback. My second review takes into account some comments made by other reviewers with which I agree. 

      We thank this reviewer again for their attention to detail and constructive comments.

      Recommendations for clarifications: 

      General comments: The purpose of the article could be made clearer in the introduction. When I differentiate results from discussion, I think of results as objective measures or observations, while discussion will relate to the interpretation of these results (including comparison with previous literature, in most cases). 

      We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsection, we believe the structure is now more accessible to readers.

      - l.39: define or discuss "cortical shape" 

      We have gone through the entire paper and corrected for any ambiguities. We specifically distinguish between the cortex as a structure overall, shape measures derived from this structure, and coarse-grained versions of the structure.

      - l.48-74: this would match either an introduction or a discussion rather than a methods section. 

      Done

      - l.98-106: this would match a discussion rather than a methods section. 

      Done

      - l.111: here could be a good spot to discuss the 4 vs 8 corners for inclusion of pial vs white matter voxelization 

      We have discussed this in the more detailed Supplementary section now, as after restructuring, this appears to be the more suitable place.

      - l.140-180: it feels that this section mixes methods, results and discussion of the results 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      - l.183-217: mix of results and discussion 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      Small cosmetic suggestions: 

      - l.44: conservation of 'some' quantities: vague 

      Changed to conservation of morphological relationships across evolution

      - l.66: order of citations ([24, 22,23]) 

      Will be fixed at proof stage depending on format of references.

      - l.77: delete space between citation and period 

      Done

      - l.77: I would delete 'say' 

      Done

      - l.86: 'but to also analyse' -> 'to analyse' 

      Done

      - l.105: remove 'we are encouraged that' 

      Done

      - l.111: 'also see' -> 'see also' 

      Done

      - l.164: 'remarkable': subjective 

      Done

      - l.189: define approx. abbreviation 

      Done

      - l.190: 'approx' -> 'approx.' 

      Revised

      - l.195: 'dramatic': subjective 

      removed

      -l. 246: 'much' -> vague 

      explained

    1. eLife assessment

      This study presents a valuable finding on predator threat detection in C. elegans and the role of neuropeptide systems in defensive behavioral strategies. The evidence supporting the conclusions is solid, although additional analyses and control experiments would strengthen the claims of the study. Overall, the work is of interest to the C. elegans community as well as neuroethologists and ecologists studying predator-prey interactions.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Quach et al. report a detailed investigation into the defense mechanisms of Caenorhabditis elegans in response to predatory threats from Pristionchus pacificus. Based on principles from predatory imminence and prey refuge theories, the authors delineate three defense modes (pre-encounter, post-encounter, and circa-strike) corresponding to increasing levels of threat proximity. These modes are observed in a controlled but naturalistic setup and are quantified by multiple behavioral outputs defined in time and/or space domains allowing nuanced phenotypic assays. The authors demonstrate that C. elegans displays graded defense behavioral responses toward varied lethality of threats and that only life-threatening predators trigger all three defense modes. The study also offers a narrative on the behavioral strategies and underlying molecular regulation, focusing on the roles of SEB-3 receptors and NLP-49 peptides in mediating responses in these defense modes. They found that the interplay between SEB-3 and NLP-49 peptides appears complex, as evidenced by the diverse outcomes when either or both genes are manipulated in various behavioral modes.

      Strengths:

      The paper presents an interesting story, with carefully designed experiments and necessary controls, and novel findings and implications about predator-induced defensive behaviors and underlying molecular regulation in this important model organism. The design of experiments and description of findings are easy to follow and well-motivated. The findings contribute to our understanding of stress response systems and offer broader implications for neuroethological studies across species.

      Weaknesses:

      Although overall the study is well designed and movitated, the paper could benefit from further improvements on some of the methods descriptions and experiment interpretations.

    3. Reviewer #2 (Public Review):

      In this study, the authors characterize the defensive responses of C. elegans to the predatory Pristionchus species. Drawing parallels to ecological models of predatory imminence and prey refuge theory, they outline various behaviors exhibited by C. elegans when faced with predator threats. They also find that these behaviors can be modulated by the peptide NLP-49 and its receptor SEB-3 in various degrees.

      The conclusions of this paper are mostly well-supported, the writing and the figures are clear and easy to interpret. However, some of the claims need to be better supported and the unique findings of this work should be clarified better in text.

      (1) Previous work by the group (Quach, 2022) showed that Pristionchus adopt a "patrolling strategy" on a lawn with adult C. elegans and this depends on bacterial lawn thickness. Consequently, it may be hypothesized that C. elegans themselves will adopt different predator avoidance strategies depending on predator tactics differing due to lawn variations. The authors have not shown why they selected a particular size and density of bacterial lawn for the experiments in this paper, and should run control experiments with thinner and denser lawns with differing edge densities to make broad arguments about predator avoidance strategies for C. elegans. In addition, C. elegans leaving behavior from bacterial lawns (without predators) are also heavily dependent on density of bacteria, especially at the edges where it affects oxygen gradients (Bendesky, 2011), and might alter the baseline leaving rates irrespective of predation threats. The authors also do not mention if all strains or conditions in each figure panel were run as day-matched controls. Given that bacterial densities and ambient conditions can affect C. elegans behavior, especially that of lawn-leaving, it is important to run day-matched controls.

      (2) Both the patch-leaving and feeding in outstretched posture behaviors described here in this study were reported in an earlier paper by the same group (Quach, 2022) as mentioned by the authors in the first section of the results. While they do characterize these further in this study, these are not novel findings of this work.

      (3) For Figures 1F-H, given that animals can reside on the lawn edges as well as the center, bins explored are not a definitive metric of exploration since the animals can decide to patrol the lawn boundary (especially since the lawns have thick edges). The authors should also quantify tracks along the edge from videographic evidence as they have done previously in Figure 5 of Quach, 2022 to get a total measure of distance explored.

      (4) Where were the animals placed in the wide-arena predator-free patch post encounter? It is mentioned that the animal was placed at the center of the arena in lines 220-221. While this makes sense for the narrow-arena, it is unclear how far from the patch animals were positioned for the wide exit arena. Is it the same distance away as the distance of the patch from the center of the narrow exit arena? Please make this clear in the text or in the methods.

      (5) Do exit decisions from the bacterial patch scale with number of bites or is one bite sufficient? Do all bites lead to bite-induced aversive response? This would be important to quantify especially if contextualizing to predatory imminence.

      (6) Why are the threats posed by aversive but non-lethal JU1051 and lethal PS312 evaluated similarly? Did the authors characterize if the number of bites are different for these strains? Can the authors speculate on why this would happen in the discussion?

      (7) The authors indicate that bites from the non-aversive TU445 led to a low number of exits and thus it was consequently excluded from further analysis. If anything, this strain would have provided a good negative control and baseline metrics for other circa-strike and post-encounter behaviors.

      8) For Figures 3 G and H, the reduction in bins explored (bins_none - bins_RS1594) due to the presence of predators should be compared between wildtype and mutants, instead of the difference between none and RS5194 for each strain.

      (9) While the authors argue that baseline speeds of seb-3 are similar to wild type (Figure S3), previous work (Jee, 2012) has shown that seb-3 not only affects speed but also roaming/dwelling states which will significantly affect the exploration metric (bins explored) which the authors use in Figs 3G-H and 4E-F. Control experiments are necessary to avoid this conundrum. Authors should either visualize and quantify tracks (as suggested in 3) or quantify roaming-dwelling in the seb-3 animals in the absence of predator threat.

      (10) While it might be beyond the scope of the study, it would be nice if the authors could speculate on potential sites of actions of NLP-49 in the discussion, especially since it is expressed in a distinct group of neurons.

    1. eLife assessment

      A combination of molecular dynamics simulation and state-of-the-art statistical post-processing techniques provided valuable insight into GPCR-ligand dynamics. This manuscript provides solid evidence for differences in the binding/unbinding of classical cannabinoid drugs from new psychoactive substances. The results could aid in mitigating the public health threat these drugs pose.

    2. Reviewer #1 (Public Review):

      This manuscript presents insights into biased signaling in GPCRs, namely cannabinoid receptors. Biased signaling is of broad interest in general, and cannabinoid signaling is particularly relevant for understanding the impact of new drugs that target this receptor. Mechanistic insight from work like this could enable new approaches to mitigate the public health impact of new psychoactive drugs. Towards that end, this manuscript seeks to understand how new psychoactive substances (NPS, e.g. MDMB-FUBINACA) elicit more signaling through β-arrestin than classical cannabinoids (e.g. HU-210). The authors use an interesting combination of simulations and machine learning.

      The caption for Figure 3 doesn't explain the color scheme, so it's not obvious what the start and end states of the ligand are.

      For the metadynamics simulations were multiple Gaussian heights/widths tried to see what, if any, impact that has on the unbinding pathway? That would be useful to help ensure all the relevant pathways were explored.

      It would be nice to acknowledge previous applications of metadynamics+MSMs and (separately) TRAM, such as the Simulation of spontaneous G protein activation... (Sun et al. eLife 2018) and Estimation of binding rates and affinities... (Ge and Voelz JCP 2022).

      What is KL divergence analysis between macrostates? I know KL divergence compares probability distributions, but it is not clear what distributions are being compared.

      I suggest being more careful with the language of universality. It can be "supported" but "showing" or "proving" its universal would require looking at all possible chemicals in the class.

    3. Reviewer #2 (Public Review):

      Summary:

      The investigation provides computational as well as biochemical insights into the (un)binding mechanisms of a pair of psychoactive substances into cannabinoid receptors. A combination of molecular dynamics simulation and a set of state-of-the art statistical post-processing techniques were employed to exploit GPCR-ligand dynamics.

      Strengths:

      The strength of the manuscript lies in the usage and comparison of TRAM as well as Markov state modelling (MSM) for investigating ligand binding kinetics and thermodynamics. Usually, MSMs have been more commonly used for this purpose. But as the authors have pointed out, implicit in the usage of MSMs lies the assumption of detailed balance, which would not hold true for many cases especially those with skewed binding affinities. In this regard, the author's usage of TRAM which harnesses both biased and unbiased simulations for extracting the same, provides a more appropriate way out.

      Weaknesses:

      (1) While the authors have used TRAM (by citing MSM to be inadequate in these cases), the thermodynamic comparisons of both techniques provide similar values. In this case, one would wonder what advantage TRAM would hold in this particular case.

      (2) The initiation of unbiased simulations from previously run biased metadynamics simulations would almost surely introduce hysteresis in the analysis. The authors need to address these issues.

      (3) The choice of ligands in the current work seems very forced and none of the results compare directly with any experimental data. An ideal case would have been to use the seminal D.E. Shaw research paper on GPCR/ligand binding as a benchmark and then show how TRAM, using much lesser biased simulation times, would fare against the experimental kinetics or even unbiased simulated kinetics of the previous report

      (4) The method section of the manuscript seems to suggest all the simulations were started from a docked structure. This casts doubt on the reliability of the kinetics derived from these simulations that were spawned from docked structure, instead of any crystallographic pose. Ideally, the authors should have been more careful in choosing the ligands in this work based on the availability of the crystallographic structures.

      (5) The last part of using a machine learning-based approach to analyse allosteric interaction seems to be very much forced, as there are numerous distance-based more traditional precedent analyses that do a fair job of identifying an allosteric job.

      (6) While getting busy with the methodological details of TRAM vs MSM, the manuscript fails to share with sufficient clairty what the distinctive features of two ligand binding mechanisms are.

    1. eLife assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RpIl ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

    2. Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

    3. Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilising AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rpsL and showed that resistance could occur via mutation in the DedA flippase and RpsL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

    5. Author response:

      eLife assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RplL ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

      General comment about narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The main focus of this study is on its previously unreported potent anti-gonococcal activity and mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      We are troubled by the statement that our paper is narrow in scope and that evidence supporting our conclusions is incomplete. We do not feel the reviews as presented substantiate drawing this conclusion about our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      The requested additions to the method describing bacterial sequencing and anti-gonococcal activity screening will be made. However, we do not think the absence of these generic methods reduces the significance of our findings.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (1) It is not clear to us why reevaluating the activity of well characterized antibiotics against known gonorrhoeae clinical strains would add value to this manuscript. The activity of clinically relevant antibiotics against antibiotic-resistant N. gonorrhoeae clinical isolates is well described in the literature. Our use of antibiotics in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      (2) If the reviewer insists, we would be happy to include MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone).

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

      (1) We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      (2) While the usefulness of screening more clinically relevant antibiotics against clinical isolates as suggested in comment 2 was not clear to us, we agree that screening these strains for oxydifficidin activity would be beneficial. We have ordered Neisseria gonorrhoeae strain AR1280, AR1281 (CDC), and Neisseria meningitidis ATCC 13090. They will be tested when they arrive.

      Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilizing AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

      (1) Spectrum/narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The focus of this study is on its previously unreported potent anti-gonococcal activity and its mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      (2) Animal models: We acknowledge the reviewer’s insight regarding the importance of in vivo validation to enhance oxydifficidin’s pre-clinical potential. However, due to the labor-intensive process needed to isolate oxydifficidin, obtaining a sufficient quantity for animal studies is beyond the scope of this study. Our future work will focus on optimizing the yield of oxydifficidin and developing a topical mouse model for subsequent investigations.

      (3) Potential SNPs: Please see our response to Reviewer #1’s comment 3. We acknowledge that potential SNPs within dedA and rplL raise concerns regarding clinical resistance, which is a common issue for protein-targeting antibiotics. Yet, as pointed out in the manuscript, obtaining mutants in the lab was a very low yield endeavor.

      Reviewer #3 (Public Review):

      Summary: The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rplL and showed that resistance could occur via mutation in the DedA flippase and RplL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

    1. eLife assessment

      This study convincingly shows that aquaporins play a key role in blood vessel formation during zebrafish development. In particular, the paper implicates hydrostatic pressure and water flow as mechanisms controlling endothelial cell migration during angiogenic sprouting. This important study significantly advances our understanding of cell migration during morphogenesis. As such, this work will be of great interest to developmental and cell biologists working on organogenesis, angiogenesis, and cell migration.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper details a study of endothelial cell vessel formation during zebrafish development. The results focus on the role of aquaporins, which mediate the flow of water across the cell membrane, leading to cell movement. The authors show that actin and water flow together drive endothelial cell migration and vessel formation. If any of these two elements are perturbed, there are observed defects in vessels. Overall, the paper significantly improves our understanding of cell migration during morphogenesis in organisms.

      Strengths:

      The data are extensive and are of high quality. There is a good amount of quantification with convincing statistical significance. The overall conclusion is justified given the evidence.

      Weaknesses:

      There are two weaknesses, which if addressed, would improve the paper.

      (1) The paper focuses on aquaporins, which while mediates water flow, cannot drive directional water flow. If the osmotic engine model is correct, then ion channels such as NHE1 are the driving force for water flow. Indeed this water is shown in previous studies. Moreover, NHE1 can drive water intake because the export of H+ leads to increased HCO3 due to the reaction between CO2+H2O, which increases the cytoplasmic osmolarity (see Li, Zhou and Sun, Frontiers in Cell Dev. Bio. 2021). If NHE cannot be easily perturbed in zebrafish, it might be of interest to perturb Cl channels such as SWELL1, which was recently shown to work together with NHE (see Zhang, et al, Nat. Comm. 2022).

      (2) In some places the discussion seems a little confusing where the text goes from hydrostatic pressure to osmotic gradient. It might improve the paper if some background is given. For example, mention water flow follows osmotic gradients, which will build up hydrostatic pressure. The osmotic gradients across the membrane are generated by active ion exchangers. This point is often confused in literature and somewhere in the intro, this could be made clearer.

    3. Reviewer #2 (Public Review):

      Summary:

      Directional migration is an integral aspect of sprouting angiogenesis and requires a cell to change its shape and sense a chemotactic or growth factor stimulus. Kondrychyn I. et al. provide data that indicate a requirement for zebrafish aquaporins 1 and 8, in cellular water inflow and sprouting angiogenesis. Zebrafish mutants lacking aqp1a.1 and aqp8a.1 have significantly lower tip cell volume and migration velocity, which delays vascular development. Inhibition of actin formation and filopodia dynamics further aggravates this phenotype. The link between water inflow, hydrostatic pressure, and actin dynamics driving endothelial cell sprouting and migration during angiogenesis is highly novel.

      Strengths:

      The zebrafish genetics, microscopy imaging, and measurements performed are of very high quality. The study data and interpretations are very well-presented in this manuscript.

      Weaknesses:

      Some of the findings and interpretations could be strengthened by additional measurements and further discussion. Also, a better comparison and integration of the authors' findings, with other previously published findings in mice and zebrafish would strengthen the paper.

    4. Reviewer #3 (Public Review):

      Summary:

      Kondrychyn and colleagues describe the contribution of two Aquaporins Aqp1a.1 and Aqp8a.1 towards angiogenic sprouting in the zebrafish embryo. By whole-mount in situ hybridization, RNAscope, and scRNA-seq, they show that both genes are expressed in endothelial cells in partly overlapping spatiotemporal patterns. Pharmacological inhibition experiments indicate a requirement for VEGR2 signaling (but not Notch) in transcriptional activation.

      To assess the role of both genes during vascular development the authors generate genetic mutations. While homozygous single mutants appear less affected, aqp1a.1;aqp8a.1 double mutants exhibit severe defects in EC sprouting and ISV formation.

      At the cellular level, the aquaporin mutants display a reduction of filopodia in number and length. Furthermore, a reduction in cell volume is observed indicating a defect in water uptake.

      The authors conclude, that polarized water uptake mediated by aquaporins is required for the initiation of endothelial sprouting and (tip) cell migration during ISV formation. They further propose that water influx increases hydrostatic pressure within the cells which may facilitate actin polymerization and formation membrane protrusions.

      Strengths:

      The authors provide a detailed analysis of Aqp1a.1 and Aqp8a.1 during blood vessel formation in vivo, using zebrafish intersomitic vessels as a model. State-of-the-art imaging demonstrates an essential role in aquaporins in different aspects of endothelial cell activation and migration during angiogenesis.

      Weaknesses:

      With respect to the connection between Aqp1/8 and actin polymerization/filopodia formation, the evidence appears preliminary and the authors' interpretation is guided by evidence from other experimental systems.

    1. Reviewer #3 (Public Review):

      Nitta et al. use a fly model of autosomal dominant optic atrophy to provide mechanistic insights into distinct disease-causing OPA1 variants. It has long been hypothesized that missense OPA1 mutations affecting the GTPase domain, which are associated with more severe optic atrophy and extra-ophthalmic neurologic conditions such as sensorineural hearing loss (DOA plus), impart their effects through a dominant negative mechanism, but no clear direct evidence for this exists particularly in an animal model. The authors execute a well-designed study to establish their model, demonstrating a mitochondrial phenotype and optic atrophy measured as axonal degeneration. They leverage this model to provide the first direct evidence for a dominant negative mechanism for 2 mutations causing DOA plus by expressing these variants in the background of a full hOPA1 complement.

      Strengths of the paper include well-motivated objectives and hypotheses, and overall solid design and execution. There is a thorough discussion of the interpretation and context of the findings. The results technically support their primary conclusions with minor limitations. First, while only partial rescue of the most clinically relevant metric for optic atrophy in this model is now acknowledged, the result nevertheless hamstrings the mechanistic experiments that follow. Second, the results statistically support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. In added experiments, the ability of WT hOPA1 and I382M but not 2708del, D438V or R445H to rescue ROS levels or mitophagy in the context of dOPA1 knockdown serves to support axonal number as a valid measure of mitochondrial function in this context. However, the critical experiment demonstrating a dominant negative effect was performed in the context of expressing WT hOPA1 along with a pathogenic variant, in which no differences in ROS, COXII expression or mitophagy were seen. This makes it difficult to conclude that the dominant negative effect of D438V and R445H on axon number is related to mitochondrial function.

      As an animal model of DOA that may serve for rapid assessment of suspected OPA1 variants, the results overall support utility of this model in identifying pathogenic variants but not in distinguishing haploinsufficiency from dominant negative mechanisms among those variants. The impact of this work in providing the first direct evidence of a dominant negative mechanism is under-stated considering how important this question is in development of genetic treatments for dominant optic atrophy.

      Comments on revised version:

      The authors have addressed the comments in my initial review. Through these modification and those related to the comments from the other reviewers, the manuscript is strengthened.

      Comments on author responses to each of the reviews:

      Reviewer 1:

      Interpretation of data has been appropriately reorganized in the discussion.

      Quantified mitochondria in the model show no difference in number. There is reduced size and structural abnormalities on electron microscopy.

      Application of mito-QC revealed increased mitophagy.

      Regarding partial rescue of axonal number in the mutant model, statistical significance between control and rescue is still not depicted in Figure 4D. Detailing possible explanations for this has been addressed in the discussion. However, only partial rescue of the most clinically relevant metric for optic atrophy in this model hamstrings subsequent mechanistic experiments that follow.

      Discussion regarding variant I382M has been improved.

      While reviewer 1's concerns about axonal number as a biomarker for OPA1 function are valid, it is worth noting that this is the most clinically relevant marker in the context of DOA. That said, I agree that the mechanistic DN/HI studies needed support using other measures of mitochondrial function, and the authors have done this. The ability of WT hOPA1 and I382M but not 2708del, D438V or R445H to rescue ROS levels or mitophagy in the context of dOPA1 knockdown serves to support axonal number as a valid measure of mitochondrial function in this context. However, the critical experiment demonstrating a dominant negative effect was performed in the context of expressing WT hOPA1 along with a pathogenic variant, in which no differences in ROS, COXII expression or mitophagy were seen. This makes it difficult to conclude that the (marginal) DN effect of D438V and R445H on axon number is related to mitochondrial function, and serves as a minor weakness of the paper.

      Which exons are included in the transcript, and therefore, which isoforms are expressed in the model, has been addressed.

      Reviewer 2:

      The authors have addressed the need to include greater methodological details.

      Language concerning the clinical utility of the model in informing treatment decisions has been appropriately modified. As pointed out by Reviewer 1, additional studies were needed to better establish the potential clinical utility of this model in screening DOA variants. The authors have completed those experiments, and the results overall support utility of this model in identifying pathogenic variants but not in distinguishing HI/DN mechanisms among those variants.

      Reviewer 3:

      The author has addressed the partial rescue effect as above.

      The authors have not modified the text to acknowledge the marginal effect sizes in the critical experiment of the study that demonstrates a DN effect. Statistically, the results indeed support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. This remains a weakness of the study.

    2. Author response:

      The following is the response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitta et al, in their manuscript titled, "Drosophila model to clarify the pathological significance of OPA1 in autosomal dominant optic atrophy." The novelty of this paper lies in its use of human (hOPA1) to try to rescue the phenotype of an OPA1 +/- Drosophilia DOA model (dOPA). The authors then use this model to investigate the differences between dominant-negative and haploinsufficient OPA1 variants. The value of this paper lies in the study of DN/HI variants rather than the establishment of the drosophila model per se as this has existed for some time and does have some significant disadvantages compared to existing models, particularly in the extra-ocular phenotype which is common with some OPA1 variants but not in humans. I judge the findings of this paper to be valuable with regards to significance and solid with regards to the strength of the evidence.

      Suggestions for improvements:

      (1) Stylistically the results section appears to have significant discussion/conclusion/inferences in section with reference to existing literature. I feel that this information would be better placed in the separate discussion section. E.g. lines 149-154.

      We appreciate the reviewer’s suggestion to relocate the discussion, conclusions, and inferences, particularly those that reference existing literature, to a separate discussion section. For lines 149–154, we placed them in the discussion section (lines 343–347) as follows. “Our established fly model is the first simple organism to allow observation of degeneration of the retinal axons. The mitochondria in the axons showed fragmentation of mitochondria. Former studies have observed mitochondrial fragmentation in S2 cells (McQuibban et al., 2006), muscle tissue (Deng et al., 2008), segmental nerves (Trevisan et al., 2018), and ommatidia (Yarosh et al., 2008) due to the LOF of dOPA1.”

      For lines 178–181, we also placed them in the discussion section (lines 347–351) as follows. “Our study presents compelling evidence that dOPA1 knockdown instigates neuronal degeneration, characterized by a sequential deterioration at the axonal terminals and extending to the cell bodies. This degenerative pattern, commencing from the distal axons and progressing proximally towards the cell soma, aligns with the paradigm of 'dying-back' neuropathy, a phenomenon extensively documented in various neurodegenerative disorders (Wang et al., 2012). ”

      For lines 213–217, 218–220, and 222–223, we also placed them in the discussion section (lines 363– 391) as follows. “To elucidate the pathophysiological implications of mutations in the OPA1 gene, we engineered and expressed several human OPA1 variants, including the 2708-2711del mutation, associated with DOA, and the I382M mutation, located in the GTPase domain and linked to DOA. We also investigated the D438V and R445H mutations in the GTPase domain and correlated with the more severe DOA plus phenotype. The 2708-2711del mutation exhibited limited detectability via HA-tag probing. Still, it was undetectable with a myc tag, likely due to a frameshift event leading to the mutation's characteristic truncated protein product, as delineated in prior studies (Zanna et al., 2008). Contrastingly, the I382M, D438V, and R445H mutations demonstrated expression levels comparable to the WT hOPA1. However, the expression of these mutants in retinal axons did not restore the dOPA1 deficiency to the same extent as the WT hOPA1, as evidenced in Figure 5E. This finding indicates a functional impairment imparted by these mutations, aligning with established understanding (Zanna et al., 2008). Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does not induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.

      (2) I do think further investigation as to why a reduction of mitochondria was noticed in the knockdown. There are conflicting reports on this in the literature. My own experience of this is fairly uniform mitochondrial number in WT vs OPA1 variant lines but with an increased level of mitophagy presumably reflecting a greater turnover. There are a number of ways to quantify mitochondrial load e.g. mtDNA quantification, protein quantification for tom20/hsp60 or equivalent. I feel the reliance on ICC here is not enough to draw conclusions. Furthermore, mitophagy markers could be checked at the same time either at the transcript or protein level. I feel this is important as it helps validate the drosophila model as we already have a lot of experimental data about the number and function of mitochondria in OPA+/- human/mammalian cells.

      We thank the reviewer for the insightful comments and suggestions regarding our study on the impact of mitochondrial reduction in a knockdown model. We concur with the reviewer’s observation that our initial results did not definitively demonstrate a decrease in the number of mitochondria in retinal axons. Furthermore, we measured mitochondrial quantity by conducting western blotting using antiCOXII and found no reduction in mitochondrial content with the knockdown of dOPA1 (Figure S4A and B). Consequently, we have revised our manuscript to remove the statement “suggesting a decreased number of mitochondria in retinal axons. However, whether this decrease is due to degradation resulting from a decline in mitochondrial quality or axonal transport failure remains unclear.” Instead, we have refocused our conclusion to reflect our electron microscopy findings, which indicate reduced mitochondrial size and structural abnormalities. The reviewer’s observation of consistent mitochondrial numbers in WT versus mutant variant lines and elevated mitophagy levels prompted us to evaluate mitochondrial turnover as a significant factor in our study. Regarding verifying mitophagy markers, we incorporated the mito-QC marker in our experimental design. In our experiments, mito-QC was expressed in the retinal axons of Drosophila to assess mitophagy activity upon dOPA1 knockdown. We observed a notable increase in mCherry positive but GFP negative puncta signals one week after eclosion, indicating the activation of mitophagy (Figure 2D–H). This outcome strongly suggests that dOPA1 knockdown enhances mitophagy in our Drosophila model. The application of mito-QC as a quantitative marker for mitophagy, validated in previous studies, offers a robust approach to analyzing this process. Our findings elucidate the role of dOPA1 in mitochondrial dynamics and its implications for neuronal health. These results have been incorporated into Figure 2, with the corresponding text updated as follows (lines 159–167): “Given that an increase in mitophagy activity has been reported in mouse RGCs and nematode ADOA models (Zaninello et al., 2022; Zaninello et al., 2020), the mitoQC marker, an established indicator of mitophagy activity, was expressed in the photoreceptors of Drosophila. The mito-QC reporter consists of a tandem mCherry-GFP tag that localizes to the outer membrane of mitochondria (Lee et al., 2018). This construct allows the measurement of mitophagy by detecting an increase in the red-only mCherry signal when the GFP is degraded after mitochondria are transported to lysosomes. Post dOPA1 knockdown, we observed a significant elevation in mCherry positive and GFP negative puncta signals at one week, demonstrating an activation of mitophagy as a consequence of dOPA1 knockdown (Figure 2D–H).”  

      (3) Could the authors comment on the failure of the dOPA1 rescue to return their biomarker, axonal number to control levels. In Figure 4D is there significance between the control and rescue. Presumably so as there is between the mutant and rescue and the difference looks less.

      As the reviewer correctly pointed out, there is a significant difference between the control and rescue groups, which we have now included in the figure. Additionally, we have incorporated the following comments in the discussion section (lines 329–342) regarding this significant difference: “In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a nonautonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, lOPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.”

      (4) The authors have chosen an interesting if complicated missense variant to study, namely the I382M with several studies showing this is insufficient to cause disease in isolation and appears in high frequency on gnomAD but appears to worsen the phenotype when it appears as a compound het. I think this is worth discussing in the context of the results, particularly with regard to the ability for this variant to partially rescue the dOPA1 model as shown in Figure 5.

      As the reviewer pointed out, the I382M mutation is known to act as a disease modifier. However, in our system, as suggested by Figure 5, I382M appears to retain more activity than DN mutations. Considering previous studies, we propose that I382M represents a mild hypomorph. Consequently, while I382M alone may not exhibit a phenotype, it could exacerbate severity in a compound heterozygous state. We have incorporated this perspective in our revised discussion (lines 375-391).

      “Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does no induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.”

      (5) I feel the main limitation of this paper is the reliance on axonal number as a biomarker for OPA1 function and ultimately rescue. I have concerns because a) this is not a well validated biomarker within the context of OPA1 variants b) we have little understanding of how this is affected by over/under expression and c) if it is a threshold effect e.g. once OPA1 levels reach <x% pathology develops but develops normally when opa1 expression is >x%. I think this is particularly relevant when the authors are using this model to make conclusions on dominant negativity/HI with the authors proposing that if expression of a hOPA1 transcript does not increase opa1 expression in a dOPA1 KO then this means that the variant is DN. The authors have used other biomarkers in parts of this manuscript e.g. ROS measurement and mito trafficking but I feel this would benefit from something else particularly in the latter experiments demonstrated in figure 5 and 6.

      The reviewer raised concerns regarding the adequacy of axonal count as a validated biomarker in the context of OPA1 mutants. In response, we corroborated its validity using markers such as MitoSOX, Atg8, and COXII. Experiments employing MitoSOX revealed that the augmented ROS signals resulting from dOPA1 knockdown were mitigated by expressing human OPA1. Conversely, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate these effects, paralleling the phenotype of axonal degeneration observed. These findings are documented in Figure 5F, and we have incorporated the following text into section lines 248–254 of the results:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      The reviewer also inquired about the effects of overexpressing and underexpressing OPA1 on axonal count and whether these effects are subject to a threshold. In response, we expressed both wild-type and variant forms of human OPA1 in Drosophila in vivo and assessed their protein levels using Western blot analysis. The results showed no significant differences in expression levels between the wild-type and variant forms in the OPA1 overexpression experiments, suggesting the absence of a variation threshold effect. These findings have been newly documented as quantitative data in Figure 5C. Furthermore, we have included a statement in the results section for Figure 6A, clarifying that overexpression of hOPA1 exhibited no discernible impact, as detailed on lines 274–276.

      “The results presented in Figure 5C indicate that there are no significant differences in the expression levels among the variants, suggesting that variations in expression levels do not influence the outcomes.”

      (6) Could the authors clarify what exons in Figure 5 are included in their transcript. My understanding is transcript NM_015560.3 contains exon 4,4b but not 5b. According to Song 2007 this transcript produces invariably s-OPA1 as it contains the exon 4b cleavage site. If this is true, this is a critical limitation in this study and in my opinion significantly undermines the likelihood of the proposed explanation of the findings presented in Figure 6. The primarily functional location of OPA1 is at the IMM and l-OPA1 is the primary opa1 isoform probably only that localizes here as the additional AA act as a IMM anchor. Given this is where GTPase likely oligomerizes the expression of s-OPA1 only is unlikely to interact anyway with native protein. I am not aware of any evidence s-OPA1 is involved in oligomerization. Therefore I don't think this method and specifically expression of a hOPA1 transcript which only makes s-OPA1 to be a reliable indicator of dominant negativity/interference with WT protein function. This could be checked by blotting UAS-hOPA1 protein with a OPA1 antibody specific to human OPA1 only and not to dOPA1. There are several available on the market and if the authors see only s-OPA1 then it confirms they are not expressing l-OPA1 with their hOPA1 construct.

      As suggested by the reviewer, we performed a Western blot using a human OPA1 antibody to determine if the expressed hOPA1 was producing the l-OPA1 isoform, as shown in band 2 of Figure 5D. The results confirmed the presence of both l-OPA1 and what appears to be s-OPA1 in bands 2 and 4, respectively. These findings are documented in the updated Figure 5D, with a detailed description provided in the manuscript at lines 224-226. Additionally, the NM_015560.3 refers to isoform 1, which includes only exons 4 and 5, excluding exons 4b and 5b. This isoform can express both l-OPA1 and s-OPA1 (refer to Figure 1 in Song et al., J Cell Biol. 2007). We have updated the schematic diagram in the figure to include these exons. The formation of s-OPA1 through cleavage occurs at the OMA1 target site located in exon 5 and the Yme1L target site in exon 5b of OPA1. Isoform 1 of OPA1 is prone to cleavage by OMA1, but a homologous gene for OMA1 does not exist in Drosophila. Although a homologous gene for Yme1L is present in Drosophila, exon 5b is missing in isoform 1 of OPA1, leaving the origin of the smaller band resembling s-OPA1 unclear at this point.

      Reviewer #2 (Public Review):

      The data presented support and extend some previously published data using Drosophila as a model to unravel the cellular and genetic basis of human Autosomal dominant optic atrophy (DOA). In human, mutations in OPA1, a mitochondrial dynamin like GTPase (amongst others), are the most common cause for DOA. By using a Drosophila loss-of-function mutations, RNAi- mediated knockdown and overexpression, the authors could recapitulate some aspects of the disease phenotype, which could be rescued by the wild-type version of the human gene. Their assays allowed them to distinguish between mutations causing human DOA, affecting the optic system and supposed to be loss-of-function mutations, and those mutations supposed to act as dominant negative, resulting in DOA plus, in which other tissues/organs are affected as well. Based on the lack of information in the Materials and Methods section and in several figure legends, it was not in all cases possible to follow the conclusions of the authors.

      We appreciate the reviewer's constructive feedback and the emphasis on enhancing clarity in our manuscript. We recognize the concerns raised about the lack of detailed information in the Materials and Methods section and several figure legends, which may have obscured our conclusions. In response, we have appended the detailed genotypes of the Drosophila strains used in each experiment to a supplementary table. Additionally, we realized that the description of 'immunohistochemistry and imaging' was too brief, previously referenced simply as “immunohistochemistry was performed as described previously (Sugie et al., 2017).” We have now expanded this section to include comprehensive methodological details. Furthermore, we have revised the figure legends to provide clearer and more thorough descriptions.

      Similarly, how the knowledge gained could help to "inform early treatment decisions in patients with mutations in hOPA1" (line 38) cannot be followed.

      To address the reviewer's comments, we have refined our explanation of the clinical relevance of our findings as follows. We believe this revision succinctly articulates the practical application of our research, directly responding to the reviewer’s concerns about linking the study's outcomes to treatment decisions for patients with hOPA1 mutations. By underscoring the model’s value in differential diagnosis and its influence on initiating treatment strategies, we have clarified this connection explicitly, within the constraints of the abstract’s word limit. The revised sentence now reads: "This fly model aids in distinguishing DOA from DOA plus and guides initial hOPA1 mutation treatment strategies."

      Reviewer #3 (Public Review):

      Nitta et al. establish a fly model of autosomal dominant optic atrophy, of which hundreds of different OPA1 mutations are the cause with wide phenotypic variance. It has long been hypothesized that missense OPA1 mutations affecting the GTPase domain, which are associated with more severe optic atrophy and extra-ophthalmic neurologic conditions such as sensorineural hearing loss (DOA plus), impart their effects through a dominant negative mechanism, but no clear direct evidence for this exists particularly in an animal model. The authors execute a well-designed study to establish their model, demonstrating a clear mitochondrial phenotype with multiple clinical analogs including optic atrophy measured as axonal degeneration. They then show that hOPA1 mitigates optic atrophy with the same efficacy as dOPA1, setting up the utility of their model to test disease-causing hOPA1 variants. Finally, they leverage this model to provide the first direct evidence for a dominant negative mechanism for 2 mutations causing DOA plus by expressing these variants in the background of a full hOPA1 complement.

      Strengths of the paper include well-motivated objectives and hypotheses, overall solid design and execution, and a generally clear and thorough interpretation of their results. The results technically support their primary conclusions with caveats. The first is that both dOPA1 and hOPA1 fail to fully restore optic axonal integrity, yet the authors fail to acknowledge that this only constitutes a partial rescue, nor do they discuss how this fact might influence our interpretation of their subsequent results.

      As the reviewer rightly points out, neither dOPA1 nor hOPA1 achieve a complete recovery. Therefore, we acknowledge that this represents only a partial rescue and have added the following explanations regarding this partial rescue in the results and discussion sections.

      Result:

      Significantly —> partially (lines 207 and 228) Discussion (lines 329–342):

      In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a non-autonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, l-OPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.

      The second caveat is that their effect sizes are small. Statistically, the results indeed support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. The authors might have considered exploring the impact of these variants on other mitochondrial outcome measures they established earlier on. They might also consider providing some functional context for this marginal difference in axonal optic nerve degeneration.

      In response to the reviewer’s comment regarding the modest effect sizes observed, we acknowledge that the magnitude of the reported changes is indeed small. To explore the impact of these variants on additional mitochondrial outcomes as suggested, we employed markers such as MitoSOX, Atg8, and COXII for validation. However, we could not detect any significant effects of the DOA plus-associated variants using these methods. We apologize for the redundancy, but to address Reviewer #1's fifth question, we present experimental results showing that while the increased ROS signals observed upon dOPA1 knockdown were rescued by expressing human OPA1, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate this effect. This outcome mirrors the axonal degeneration phenotype and is documented in Figure 5F. The following text has been added to the results section

      lines 248–254:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      Despite these caveats, the authors provide the first animal model of DOA that also allows for rapid assessment and mechanistic testing of suspected OPA1 variants. The impact of this work in providing the first direct evidence of a dominant negative mechanism is under-stated considering how important this question is in development of genetic treatments for DOA. The authors discuss important points regarding the potential utility of this model in clinical science. Comments on the potential use of this model to investigate variants of unknown significance in clinical diagnosis requires further discussion of whether there is indeed precedent for this in other genetic conditions (since the model is nevertheless so evolutionarily removed from humans).

      As suggested by the reviewer, we have expanded the discussion in our study to emphasize in greater detail the significance of the fruit fly model and the MeDUsA software we have developed, elaborating on the model's potential applications in clinical science and its precedents in other genetic disorders. Our text is as follows (lines 299–318):

      “We have previously utilized MeDUsA to quantify axonal degeneration, applying this methodology extensively to various neurological disorders. The robust adaptability of this experimental system is demonstrated by its application in exploring a wide spectrum of genetic mutations associated with neurological conditions, highlighting its broad utility in neurogenetic research. We identified a novel de novo variant in Spliceosome Associated Factor 1, Recruiter of U4/U6.U5 Tri-SnRNP (SART1). The patient, born at 37 weeks with a birth weight of 2934g, exhibited significant developmental delays, including an inability to support head movement at 7 months, reliance on tube feeding, unresponsiveness to visual stimuli, and development of infantile spasms with hypsarrhythmia, as evidenced by EEG findings. Profound hearing loss and brain atrophy were confirmed through MRI imaging. To assess the functional impact of this novel human gene variant, we engineered transgenic Drosophila lines expressing both wild type and mutant SART1 under the control of a UAS promoter.

      Our MeDUsA analysis suggested that the variant may confer a gain-of-toxic-function (Nitta et al.,  2023). Moreover, we identified heterozygous loss-of-function mutations in DHX9 as potentially causative for a newly characterized neurodevelopmental disorder. We further investigated the pathogenic potential of a novel heterozygous de novo missense mutation in DHX9 in a patient presenting with short stature, intellectual disability, and myocardial compaction. Our findings indicated a loss of function in the G414R and R1052Q variants of DHX9 (Yamada et al., 2023). This experimental framework has been instrumental in elucidating the impact of gene mutations, enhancing our ability to diagnose how novel variants influence gene function.”

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall I enjoyed reading this paper. It is well presented and represents a significant amount of well executed study. I feel it further characterizes a poorly understood model of OPA1 variants and one which displays significant differences with the human phenotype. However I feel the use of this model with the author's experiments are not enough to validate this model/experiment as a screening tool for dominant negativity. I have therefore suggested the above experiments as a way to both further validate the mitochondrial dysfunction in this model and to ensure that the expressed transcript is able affect oligomerization as this is a pre-requisite to the authors conclusions.

      We assessed the extent to which our model reflects mitochondrial dysfunction using COXII, Atg8, and MitoSOX markers. Unfortunately, neither COXII levels nor the ratio of Atg8a-1 to Atg8a-2 showed significant variations across genotypes that would clarify the impact of dominant negative mutations. Nonetheless, MitoSOX and mito-QC results revealed that mitochondrial ROS levels and mitophagy are increased in Drosophila following intrinsic knockdown of dOPA1. These findings are documented in Figures 2, 5, and S6.

      Regarding oligomer formation, the specifics remain elusive in this study. However, the expression of dOPA1K273A, identified as a dominant negative variant in Drosophila, significantly

      disrupted retinal axon organization, as detailed in Figure S7. From these observations, we hypothesize that oligomerization of wild-type and dominant negative forms in Drosophila results in axonal degeneration. Conversely, co-expression of Drosophila wild-type with human dominant negative forms does not induce degeneration, suggesting that they likely do not interact.

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      The authors used GMR-Gal4 to express OPA1-RNAi. I) GMR is expressed in most cells in the developing eye behind the morphogenetic furrow. So the defects observed can be due to knock- down in support cells rather than in photoreceptor cells.

      We have added the following sentences in the result (lines 194–196)."The GMR-Gal4 driver does not exclusively target Gal4 expression to photoreceptor cells. Consequently, the observed retinal axonal degeneration could potentially be secondary to abnormalities in support cells external to the photoreceptors.”

      OPA1-RNAi: how complete is the knock-down? Have the authors tested more than one RNAi line?

      We conducted experiments with an additional RNAi line, and similarly observed degeneration in the retinal axons (Figure S2 A and B; lines 178–179).

      The loss-of-function allele, induced by a P-element insertion, gives several eye phenotypes when heterozygous (Yarosh et al., 2008). Does RNAi expression lead to the same phenotypes?

      A previous report indicated that the compound eyes of homozygous mutations of dOPA1 displayed a glossy eye phenotype (Yarosh et al., 2008). Upon knocking down dOPA1 using the GMR-Gal4 driver, we also observed a glossy eye-like rough eye phenotype in the compound eyes. These findings have been added to Figure S3 and lines 192–194.

      There is no description on the way the somatic clones were generated. How were mutant cells in clones distinguished from wild-type cells (e. g. in Fig. 4).

      In the Methods section, we described the procedure for generating clones and their genotypes as follows (lines 502–505): "The dOPA1 clone analysis was performed by inducing flippase expression in the eyes using either ey-Gal4 with UAS-flp or ey3.5-flp, followed by recombination at the chromosomal location FRT42D to generate a mosaic of cells homozygous for dOPA1s3475." Furthermore, we have created a table detailing these genotypes. In these experiments, it was not possible to differentiate between the clone and WT cells. Accordingly, we have noted in the Results section (lines 201–203): "Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.”

      Why were flies kept at 29{degree sign}C? this is rather unusual.

      Increased temperature was demonstrated to induce elevated expression of GAL4 (Kramer and Staveley, Genet. Mol. Res., 2003), which in turn led to an enhanced expression of the target genes. Therefore, experiments involving knockdown assays or Western blotting to detect human OPA1 protein were exclusively conducted at 29°C. However, all other experiments were performed at 25°C, as described in the methods sections: “Flies were maintained at 25°C on standard fly food. For knockdown experiments (Figures 1C–E, 1F–H, 2A–H, 3B–K, 5F, S1, S2 A and B, and S6A), flies were kept at 29°C in darkness.” Furthermore, “We regulated protein expression temporally across the whole body using the Tub-Gal4 and Tub-GAL80TS system. Flies harboring each hOPA1 variant were maintained at a permissive temperature of 20°C, and upon emergence, females were transferred to a restrictive temperature of 29°C for subsequent experiments.”

      Legends:

      It would be helpful to have a description of the genotypes of the flies used in the different experiments. This could also be included as a table.

      We have created a table detailing the genotypes. Additionally, in the legend, we have included a note to consult the supplementary table for genotypes.

      Results:

      Line 141: It is not clear what they mean by "degradation", is it axonal degeneration? And if so, what is the argument for this here?

      In the manuscript, we addressed the potential for mitochondrial degradation; however, recognizing that the expression was ambiguous, the following sentence has been omitted: "Nevertheless, the degradation resulting from mitochondrial fragmentation may have decreased the mitochondrial signal.”

      Fig. 2: Axons of which photoreceptors are shown?

      We have added "a set of the R7/8 retinal axons" to the legend of Figure 2.

      Line 167: The authors write that axonal degeneration is more severe after seven days than after eclosion. Is this effect light-dependent? The same question concerns the disappearance of the rhabdomere (Fig. 3G–J).

      We conducted the experiments in darkness, ensuring that the observed degeneration is not light- dependent. This condition has been added to the methods section to clarify the experimental conditions.

      Line 178/179: Based on what results do they conclude that there is degeneration of the "terminals" of the axons?

      Quantification via MeDUsA has enabled us to count the number of axonal terminals, and a noted decrease has led us to conclude axonal terminal degeneration. We have published two papers on these findings. We have added the following description to the results section to clarify how we defined degeneration (lines 174–176): "We have assessed the extent of their reduction from the total axonal terminal count, thereby determining the degree of axonal terminal degeneration (Richard JNS 2022; Nitta HMG 2023).

      Line 189: They write: ".. we observed dOPA1 mutant axons...". How did they distinguish es mutant from the controls?

      Fig. 5 and Fig. 6: How did they distinguish genetically mutant cells from genetically control cells in the somatic clones?

      Mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them. Accordingly, this point has been added to lines 201–203, “Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.” and the text in the results section has been modified as follows: (Before)“To determine if dOPA1 is responsible for axon neurodegeneration, we observed the dOPA1 mutant axons by expressing full- length versions of dOPA1 in the photoreceptors at one day after eclosion and found that dOPA1 expression significantly rescued the axonal degeneration” —> (After)“To determine if dOPA1 is responsible for axon neurodegeneration, we quantify the number of the axons in the dOPA1 eye clone fly with the expression of dOPA1 at one day after eclosion and found that dOPA1 expression partially rescued the axonal degeneration”

      Line 225/226: It is not clear to me how their approach "can quantitatively measure the degree of LOF".

      To address the reviewer's question and clarify how our approach quantitatively measures the degree of loss of function (LOF), we revised the statement (lines 238–247):

      "Our methodology distinctively facilitates the quantitative evaluation of LOF severity by comparing the rescue capabilities of various mutations. Notably, the 2708-2711del and I382M mutations demonstrated only partial rescue, indicative of a hypomorphic effect with residual activity. In contrast, the D438V and R445H mutations failed to show significant rescue, suggesting a more profound LOF. The correlation between the partial rescue by the 2708-2711del and I382M mutations and their classification as hypomorphic is significant. Moreover, the observed differences in rescue efficacy correspond to the clinical severities associated with these mutations, namely in DOA and DOA plus disorders. Thus, our results substantiate the model’s ability to quantitatively discriminate among mutations based on their impact on protein functionality, providing an insightful measure of LOF magnitude.”

      Discussion:

      Line 251, 252 and line 358: What is "the optic nerve" in the adult Drosophila?

      In humans, the axons of retinal ganglion cells (RGCs) are referred to as the optic nerve, and we posit that the retinal axons in flies are similar to this structure. In the introduction section, where it is described that the visual systems of flies and humans bear resemblance, we have appended the following definition (lines 107–108): “In this study, we defined the retinal axons of Drosophila as analogous to the human optic nerve.”

      Line 344: These bands appear only upon overexpression of the hOPA1 constructs, so this part of the is very speculative.

      Confirmation was achieved using anti-hOPA1, demonstrating that myc is not nonspecific. These results have been added to Figure 5D. Furthermore, the phrase “The upper band was expected as” has been revised to “From a size perspective, the upper band was inferred to represent the full-length hOPA1 including the mitochondria import sequence (MIS).” (lines 464–465)

      I was missing a discussion about the increase of ROS upon loss/reduction of dOPA1 observed by others and described here. Is there an increase of ROS upon expression of any of the constructs used?

      We demonstrated that not only axonal degeneration but also ROS can be suppressed by expressing human OPA1 in the genetic background of dOPA1 knockdown. Additionally, rescue was not possible with any variants except for I382M. Furthermore, we assessed whether there were changes in ROS in the evaluation of dominant negatives, but no significant differences were observed in this experimental system. These findings have been added to the discussion section as follows (lines 318–328). “Our research established that dOPA1 knockdown precipitates axonal degeneration and elevates ROS signals in retinal axons. Expression of human OPA1 within this context effectively mitigated both phenomena; it partially reversed axonal degeneration and nearly completely normalized ROS levels. These results imply that factors other than increased ROS may drive the axonal degeneration observed post-knockdown. Furthermore, while differences between the impacts of DN mutations and loss-of- function mutations were evident in axonal degeneration, they were less apparent when using ROS as a biomarker. The extensive use of transgenes in our experiments might have mitigated the knockdown effects. In a systemic dOPA1 knockdown, assessments of mitochondrial quantity and autophagy activity revealed no significant changes, suggesting that the cellular consequences of reduced OPA1 expression might vary across different cell types.”

      Reviewer #3 (Recommendations For The Authors):

      Consider being more explicit regarding literature that has or has failed to test a direct dominant negative effect by expressing a variant in question in the background of a full OPA1 complement. My understanding is that this is the first direct evidence of this widely held hypothesis. This lends to the main claim promoting the utility of fly as a model in general. The authors might also outline this in the introduction as a knowledge gap they fill through this study.

      In the introduction, we have incorporated a passage that highlights precedents capable of distinguishing between LOF and DN effects, and we note the absence of models capable of dissecting these distinctions within an in vivo organism. This study aims to address this gap, proposing a model that elucidates the differential impacts of LOF and DN within the context of a living model organism, thereby contributing to a deeper understanding of their roles in disease pathology. We added the following sentences in the introduction (lines 71–80).

      “In the quest to differentiate between LOF and DN effects within the context of genetic mutations, precedents exist in simpler systems such as yeast and human fibroblasts. These models have provided valuable insights into the conserved functions of OPA1 across species, as evidenced by studies in yeast models (Del Dotto et al., 2018) and fibroblasts derived from patients harboring OPA1 mutations (Kane et al., 2017). However, the ability to distinguish between LOF and DN effects in an in vivo model organism, particularly at the structural level of retinal axon degeneration, has remained elusive. This gap underscores the necessity for a more complex model that not only facilitates molecular analysis but also enables the examination of structural changes in axons and mitochondria, akin to those observed in the actual disease state.”

      The authors should clarify the language used in the abstract and introduction on the effect of hOPA1 DOA and DOA plus on the dOPA1- phenotype. Currently written as "none of the previously reports mutations known to cause DOA or DOA plus were rescued, their functions seems to be impaired." but presumably the authors mean that these variants failed to rescue to the dOPA1 deficient phenotype.

      We thank the reviewer for the constructive feedback. We acknowledge the need for clarity in our description of the effects of hOPA1 DOA and DOA plus mutations on the dOPA1- phenotype in both the abstract and the introduction. The current phrasing, "none of the previously reported mutations known to cause DOA or DOA plus were rescued, their functions seem to be impaired," may indeed be confusing. To address your concern, we have revised this statement to more accurately reflect our findings: "Previously reported mutations failed to rescue the dOPA1 deficiency phenotype." For Abstract site, we have changed as following. "we could not rescue any previously reported mutations known to cause either DOA or DOA plus.”→ “mutations previously identified did not ameliorate the dOPA1 deficiency phenotype.”

      DOA plus is associated with a multiple sclerosis-like illness; as written it suggests that the pathogenesis of sporadic multiple sclerosis and that associated with DOA plus share and underlying pathogenic mechanism. Please use the qualifier "-like illness." 

      We have added the term “multiple sclerosis-like illness” wherever “multiple sclerosis” is mentioned.

    3. eLife assessment

      This study provides valuable insights into the complex genetics of dominant optic atrophy. Leveraging a fly model, the investigators provide solid evidence, albeit with small effect sizes, for a dominant negative mechanism of certain pathogenic variants that tend to cause more severe phenotypes, a long held hypothesis in the field. The work is of high interest to those in the optic atrophy and degeneration fields.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Strengths:

      The authors have generated a novel transgenic mouse line to specifically label mature differentiated oligodendrocytes, which is very useful for tracing the final destiny of mature myelinating oligodendrocytes. Also, the authors carefully compared the distribution of three progenitor cre mouse lines and suggested that Gsh-cre also labeled dorsal OLs, contrary to the previous suggestion that it only marks LGE-derived OPCs. In addition, the author also analyzed the relative contributions of OLs derived from three distinct progenitor domains in other forebrain regions (e.g. Pir, ac). Finally, the new transgenic mouse lines and established multiple combinatorial genetic models will facilitate future investigations of the developmental origins of distinct OL populations and their functional and molecular heterogeneity.

      Weaknesses:

      Since OpalinP2A-Flpo-T2A-tTA2 only labels mature oligodendrocytes but not OPCs, the authors can not suggest that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation (line 118-9). It remains possible that LGE/CGE-derived OPCs migrate into the cortex but are later eliminated.

      We are glad that the reviewer appreciates our work and are grateful for the positive comments and the constructive suggestion. We agree with the reviewer that our methodology by itself cannot suggest whether the lack of LGE/CGE-derived-OLs in the neocortex is caused by competitive postnatal elimination or not. That is why we cited a parallel work by Li et al. (ref [17] in the original manuscript; ref [19] in the revised manuscript), in which in utero electroporation (IUE) failed to label LGE-derived OL lineage cells in both embryonic and early postnatal brains. Although they did not directly explore CGE using IUE, their fate mapping results using Emx1-Cre; Nkx2.1-Cre; H2B-GFP at P0 and P10 revealed very low percentage of LGE/CGE-derived OL lineage cells. The lack of adult labeling in our study together with the lack of developmental labeling in the other study prompted us to hypothesize that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation. In the revised manuscript, we have expanded the discussion to explain this point more clearly.

      Reviewer #2 (Public Review):

      [...] Strengths:

      The strength and novelty of the manuscript lies in the elegant tools generated and used and which have the potential to elegantly and accurately resolve the issue of the contribution of different progenitor zones to telencephalic regions.

      We are glad that the reviewer appreciates our work and are grateful for the overall positive comments.

      Weaknesses:

      (1) Throughout the manuscript (with one exception, lines 76-78), the authors quantified OL densities instead of contributions to the total OL population (as a % of ASPA for example). This means that the reader is left with only a rough estimation of the different contributions.

      We thank the reviewer for this constructive suggestion. We have replaced the density quantification (Figure 2F and 3D in the original manuscript) with contributions to the total OL population (% of ASPA) (Figure 2J and 2N in the revised manuscript).

      (2) All images and quantifications have been confined to one level of the cortex and the potential of the MGE and the LGE/CGE to produce oligodendrocytes for more anterior and more posterior cortical regions remains unexplored.

      The quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. We apologize for not having stated and presented this information clearly enough, and for the confusions it may have caused. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200*) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      (3) Hence, the statement that "In summary, our findings significantly revised the canonical model of forebrain OL origins (Figure 4A) and provided a new and more comprehensive view (Figure 4B )." (lines 111, 112) is not really accurate as the findings are neither new nor comprehensive. Published manuscripts have already shown that (a) cortical OLs are mostly generated from the cortex [Tripathi et al 2011 (https://doi.org/10.1523/JNEUROSCI.6474-10.2011), Winker et al 2018 (https://doi.org/10.1523/JNEUROSCI.3392-17.2018) and Li et al (https://doi.org/10.1101/2023.12.01.569674)] and (b) MGE-derived OLs persist in the cortex [Orduz et al 2019 (https://doi.org/10.1038/s41467-019-11904-4) and Li et al 2024 (https://doi.org/10.1101/2023.12.01.569674)]. Extending the current study to different rostro-caudal regions of the cortex would greatly improve the manuscript.

      As explained in the response to comment (2), our original quantifications included different rostro-caudal regions of the cortex. In the revised manuscript, we have added more schematics and representative images in the Supplementary Figure 2 for better illustration to resolve the concern of comprehensiveness.

      We thank the reviewer for listing and summarizing highly relevant published researches along with the parallel study by Li et al. submitted to eLife. We apologize for the omission of the first two references in our original manuscripts and have cited them in appropriate places (ref [10] and ref [11] in the revised manuscript). However, we believe these works do not compromise the novelty and significance of our work for the following reasons:

      (1) Tripathi et al. 2011 (ref [10] in the revised manuscript) analyzed OL lineage cells in the corpus callosum and the spinal cord, but not in the cortex and anterior commissure. Their analysis was performed in juvenile mice (P12/13), not in adulthood. Most importantly, their analysis of ventrally derived OL lineage cells relied on lineage tracing using Gsh2Cre, which in fact also label OLs derived from Gsh2+ dorsal progenitors. In contrast, we analyzed mature OLs in the cortex, corpus callosum and anterior commissure in 2-month-old adult mice. We used intersectional and subtractive strategy to label OLs derived from dorsal, LGE/CGE and MGE/POA origins. Our strategy differentiated the two different ventral lineages (LGE/CGE vs. MGE/POA) and avoided mixed labeling of OLs from ventral and dorsal Gsh2+ progenitors.

      (2) Winkler et al. 2018 (ref [11] in the revised manuscript) analyzed OLs derived from dorsal progenitors but only quantified those in the gray matter and the white matter of somatosensory cortex. Their quantification relied on co-staining with Olig2/Sox10, and thereby included both oligodendrocyte precursors (OPCs) and OLs. In contrast, we analyzed mature OLs from three origins and quantified not only neocortical regions (Mo and SS) but also an archicortical region (Pir). Our analysis revealed that although dorsally derived OLs dominate neocortex, ventrally derived OLs, especially the LGE/CGE-derived ones, dominate piriform cortex.

      (3) Orduz et al. 2019 (ref [7] in the original manuscript and the revised manuscript) mainly focused on POA-derived OLs in the somatosensory cortex. Although they performed limited analysis on MGE/POA-derived OPCs at postnatal day 10 and 19, no quantification of MGE/POA-derived OLs was performed in terms of their density, contribution to the total OL population and spatial distribution in the cortex. In contrast, we performed systematic quantification on these aspects to demonstrate that MGE/POA-derived OLs make small but sustained contribution to cortex with a distribution pattern distinctive from those derived from the dorsal origin.

      (4) Li et al. 2024 (ref [17] in the original manuscript and [19] in the revised manuscript) is a parallel study submitted to eLife. Their and our independent discoveries nicely complemented each other. Using different sets of techniques and experiments but some shared genetic mouse models, we both found that LGE/CGE made minimum contribution to neocortical OLs. Their analysis in the prenatal and early postnatal stages together with our analysis in the adult brain painted a more comprehensive picture of cortical oligodendrogenesis. The uniqueness of our work is that we performed systematic quantification of all three origins and uncovered the differential contributions to neocortex, piriform cortex, corpus callosum and anterior commissure.

      In summary, our work developed novel strategies to faithfully trace OLs from the three different origins and performed systematic analysis in the adult brain. Our data uncovered their differential contributions to neocortex, piriform cortex and the two commissural white matter tracts, which significantly differ not only from the canonical view but also from other previous studies in aspects discussed above. We believe our discoveries did significantly revise the canonical model of forebrain OL origins and provided a new and more comprehensive view.

      Reviewer #3 (Public Review):

      [...] Intriguingly, by using an indirect subtraction approach, they hypothesize that both Emx1-negative and Nkx2.1-negative cells represent the progenitors from lateral/caudal ganglionic eminences (LC), and conclude that neocortical OLs are not derived from the LC region.The authors claim that Gsh2 is not exclusive to progenitor cells in the LC region (PMID: 32234482). However, Gsh2 exhibits high enrichment in the LC during early embryonic development. The presence of a small population of Gsh2-positive cells in the late embryonic cortex could originate/migrate from Gsh2-positive cells in the LC at earlier stages (PMID: 32234482). Consequently, the possibility that cortical OLs derived from Gsh2+ progenitors in LC could not be conclusively ruled out. Notably, a population of OLs migrating from the ventral to the dorsal cortical region was detected after eliminating dorsal progenitor-derived OLs (PMID: 16436615).

      The indirect subtraction data for LC progenitors drawn from the OpalinFlp-tdTOM reporter in Emx1-negative and Nkx2.1-negative cells in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line present some caveats that could influence their conclusion. The extent of activity from the two Cre lines in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mice remains uncertain. The OpalinFlp-tdTOM expression could occur in the presence of either Emx1Cre or Nkx2.1Cre, raising questions about the contribution of the individual Cre lines. To clarify, the authors should compare the tdTOM expression from each individual Cre line, OpalinFlp::Emx1Cre::RC::FLTG or OpalinFlp::Nkx2.1Cre::RC::FLTG, with the combined OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line. This comparison is crucial as the results from the combined Cre lines could appear similar to only one Cre line active.

      Overall, the authors provided intriguing findings regarding the origin and fate of oligodendrocytes from different progenitor cells in embryonic brain regions. However, further analysis is necessary to substantiate their conclusion about the fate of LC-derived OLs convincingly.

      We thank the reviewer for these thoughtful comments. We agree with the reviewer that the presence of Gsh2-positive cells in the late embryonic cortex by itself could not rule out the possibility that they originate/migrate from Gsh2-positive cells in the LC at earlier stages. Staining dorsal-lineage intermediate progenitors with Gsh2, or performing intersectional lineage tracing using Gsh2Cre along with a dorsal-specific Flp driver, would provide more direct evidence on this issue. Nonetheless, as our lineage tracing of LGE/CGE-derive OLs did not employ Gsh2Cre, the doubt on the identity of Gsh2+ cortical progenitors should not affect the interpretation of our data.

      Regarding the subtractional LCOL labeling strategy used in our study, we wonder if there was any misunderstanding by the reviewer. As stated in our manuscript (line 59-61) and reiterated by the reviewer, OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG labels OLs derived from progenitors that express neither Emx1Cre nor Nkx2.1Cre. As these two progenitor pools do not overlap with each other, there is a purely additive effect of their actions. If there is any concern about efficiency and specificity, it would be non-adequate Cre-mediated recombinations that lead to mislabeling of dOLs or MPOLs as LCOLs (i.e., OLs derived from Emx1 or Nkx2.1-expressing progenitors were not successfully “subtracted” and thereby “wrongly” retained RFP expression). Therefore, the bona-fide LGE/CGE-derive OLs would only be fewer but not more than RFP+ LCOLs labeled by our subtractional strategy, even if any of the Cre lines did not work efficiently enough. In any case, this would not affect our conclusion that LGE/CGE-derive OLs make a minimal contribution to neocortex, as the “ground truth” contribution by LGE/CGE could only be less but not more than what we have observed using the current strategy.

      In support of our conclusion, a parallel study by Li et al. 2024 (ref [17] in the original manuscript; ref [19] in the revised manuscript) also provided independent experimental evidence that “any contribution of oligodendrocyte precursors to the developing cortex from the lateral ganglionic eminence is minimal in scope (quoted from its eLife assessment).” In addition, in their revision, they performed Gsh2 immunostaining in P0 Emx1Cre::HG-loxP mouse and found nearly all Gsh2+ cells in the cortical SVZ were derived from the Emx1+ lineage. We are glad that this additional piece of evidence further clarified the case, but still want to emphasize that the subtractional strategy we took was designed purposefully to avoid the potential uncertainty of Gsh2Cre and to more faithfully label LGE/CGE-derived OLs. Therefore, the validity of our conclusion about the fate of LC-derived OLs should be independent from the question on the identity of Gsh2+ cortical progenitors and stands well by itself.

      We hope that these explanations have adequately addressed the reviewer’s concerns. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In Figures 2C, 2D, 2E and 3D, the authors should provide counts of labelled cells as a % of ASPA+ cells. This will give an accurate picture of the contribution of the different progenitor regions to OLs.

      The graphs in Figure 2F are unnecessary since they are simply repeats of C-E but re-arranged.

      We thank the reviewer for the valuable suggestions. These two recommendations are sort of related, and thereby we made the following changes. We replaced the density quantification in Figure 2F and 3D with % of ASPA (Figure 2J and 2N in the revised manuscript) to give an accurate picture of the contribution of the different progenitor regions to OLs, as suggested by the reviewer. We still retained the density counts in Figure 2C-E (Figure 2G-I in the revised manuscript). Together with quantifications of rotral-caudal and larminar distributions presented in Supplementary Figure 2, these data demonstrated that OLs from differential origins display distinct spatial distribution patterns.

      At what ages were the quantifications performed in all the figures?

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section of the revised manuscript.

      In 2D, and 3B the GFP should have been activated but the authors do not show it or quantify it presumably because GFP would flood the sections in the presence of Emx1Cre. Nevertheless, since eGFP is shown in the diagram in 2B, the authors should mention why they chose not to show it.

      We thank the reviewer for the helpful comment and the suggestion. We have modified the schematic in Figure 2B and added explanation in the figure legend (line 308-313). We also added a schematic in Supplementary Figure 1A along with images of GFP channel in Supplementary Figure 1D (line 338-350).

      All the main figures and supplementary figures are too small to see properly.

      We are sorry that there was severe compression of images in the combined manuscript file at the conversion step during the initial submission. We apologize for the compromised image quality and have re-uploaded full-size figures as individual files on BioRxiv soon after receiving the reviews. For the revised manuscript, we also take care to upload full-size figures at high resolution as individual files to ensure their quality of presentation.

      Supplementary Figure 2E is unnecessary and perhaps misleading the reader that cortical-derived OLs have a preference for the lower layers whereas the distribution may simply reflect the distribution of OLs in the cortex.

      We thank the reviewer for the helpful comment and the suggestion. We have removed this panel and replaced it with quantifications of relative laminar distributions of the total (ASPA+) OLs along with those from the three different origins (Supplementary Figure 2G in the revised manuscript). Indeed, the preference for the lower layers of dorsally-derived OLs mirrored the distribution of total OLs in the cortex, while the MGE/POA-derived OLs deviate significantly from others and exhibit higher preference towards layer 4.

      Quantification of labelled cells as a % of ASPA should also be performed in Supplementary Figure 3.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included quantifications of labelled cells as % of ASPA for both OpalinFlp::Emx1Cre::Ai65 and  OpalinFlp::Nkx2.1Cre::Ai65 (Figure 2J and N). The sum of the these two data sets will be equivalent to those of OpalinFlp::Emx1Cre::Nkx2.1Cre::Ai65 shown in Supplementary Figure 3, and thereby we did not perform additional quantifications to avoid redundant efforts.

      Imaging and quantification should be extended to more posterior regions of the cortex to find out whether the contribution is different from the areas already examined.

      We thank the reviewer for the suggestion on imaging and apologize for the confusion about the range of quantification. As explained in the response to comment (2) of weakness, the quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should provide Opalin reporter expression data across various brain regions at different developmental stages to clarify the expression pattern of the reporter.

      We appreciate the reviewer’s comment. We chose to performed all quantifications in adult mice as Opalin is a well-established marker for differentiated OLs and the recombinase-dependent reporter expression is accumulative and irreversible. If there is any non-specific labeling in any earlier developmental stage, it would be retained and manifested at the timepoint we examined as well. In another word, the fact that we did not detect any non-specific labeling in the current dataset but only confined labeling in mature OLs ensured that no non-OL labeling was present in earlier timepoint. As shown in Figure 1D-F, reporter expression activated by the Opalin driver is presented at high OL specificity in all analyzed brain regions. This is further corroborated by results from combinatorically labeled samples (Figure 2 and Supplementary Figure 2), in which only OLs but not any other cell types were labeled in all analyzed brain regions too. Following the reviewers’ suggestions, we have added representative images of more rostral and more caudal cortical regions (Supplementary Figure 2B-D), which also showed highly specific OL labeling.  

      (2) In Figure 1D, please specify the developmental stage of the mice used for staining.

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript.

      (3) The authors should clarify if the Opalin reporter expressed in OPCs and astrocytes at developmental stages of mice, such as P0, P7, and P30.

      We appreciate the reviewer’s comment, but as explained in response to comment (1), Opalin is a well-established marker for differentiated OLs which is not expressed in OPCs or astrocytes. As shown in Figure 1D-E, reporter expression is confined to CC1+ differentiated OLs with no colocalization with Sox9 (astrocyte marker). In support with this observation, only ASPA+ differentiated OLs but no OPC or astrocyte were labeled in any of the combinatorial lineage tracing samples generated using this line combined with progenitor-Cre lines. In addition to marker staining, we also did not observe any RFP+ cells with OPC or astrocyte morphology. As the recombinase-dependent reporter expression is accumulative and irreversible, the fact no non-specific labeling was observed in adult brain retrospectively proved the specificity of Oplain-Flp in earlier developmental stages.

      (4) In Figure 1E, authors should address why the efficiency of the tdTomato line is notably lower compared to that of H2B-GFP and whether the stability of reporters could impact the conclusions drawn.

      The difference in reporting efficiency is mainly caused by differences inherent to the two reporting systems. The TRE-RFP reporter is derived from Ai62, composed of a Tet response element and tdTomato inserted into the T1 TIGRE locus. The tdTomato expression is driven by tTA-TRE transcriptional activation. The HG-loxP reporter is derived from HG-Dual, composed of a CAG promoter, a frt-flanked STOP cassette, and H2B-GFP inserted into the Rosa26 locus. The H2B-GFP expression is driven by CAG promoter after Flp-mediated removal of the STOP cassette. A Flp-dependent tdTomato reporter designed in the same way as the HG-FRT reporter would have similar efficiency. In fact, the RC::FLTG reporter can be viewed as such a reporter in the absence of Cre, which did show similarly high efficiency as HG-FRT and supported efficient subtractive labeling of LGE/CGE-derived OLs. We apologize for a typo in the title of the Y-axis of the right panel in the original Figure 1F which may have caused potential misunderstanding. The “RFP+CC1+/CC1” should be “XFP+CC1/CC1”. We have corrected this mistake and revised the figure legend for clearer description of the data (Line 293-302 in the revised manuscript).

      (5) In Figure 2, please clarify the developmental stage of the mice used for staining. Authors should present the eGFP image in addition to tdTOM.

      We apologize for the omission of the age information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript. We thank the reviewer for the suggestion on eGFP image and have presented it in supplementary Figure 1 in the revised manuscript.

      (6) in Figure 2D, authors should display the eGFP image alongside the tdTomato image. It is difficult to assess the efficiency of Emx-Cre and Nkx2.1-Cre.

      We thank the reviewer for the suggestion on eGFP image and have presented eGFP image in Supplementary Figure 1D in the revised manuscript. There are two reasons why we chose to present it in the supplementary figure instead of main figure. First, we added ASPA staining in the green channel along with quantifications of RFP cells as % of ASPA in Figure 2 in the revised manuscript, following reviewer #2’s suggestion. Second, as pointed out by reviewer #2, GFP would flood the sections in the presence of Emx1Cre and could be quite distractive if it was shown together with RFP.

      We were not entirely sure what exactly the reviewer means by “assess the efficiency of Emx-Cre and Nkx2.1-Cre”, but we believe that the quantifications of RFP cells as % of ASPA clarified the contribution of each origin to the total OLs (Figure 2J and 2N in the revised manuscript).

      (7) Figure 3 depicts the entire brain, replicating the image presented in Figure 2. It would be beneficial to consolidate Figures 2 and 3, as they showcase identical brain scans of different regions.

      We thank the reviewer for the constructive suggestion and have consolidated Figures 2 and 3 in the original manuscript into Figure 2 in the revised manuscript.

    2. Reviewer #2 (Public Review):

      In this manuscript, Cai et al use a combination of mouse transgenic lines to re-examine the question of the embryonic origin of telencephalic oligodendrocytes (OLs). Their tools include a novel Flp mouse for labelling mature oligodendrocytes and a number of pre-existing lines (some previously generated by the last author in Josh Huang's lab) that allowed combinatorial or subtractive labelling of oligodendrocytes with different origins. The conclusion is that cortically-derived OLs are the predominant OL population in the motor and somatosensory cortex and underlying corpus callosum, while the LGE/CGE generates OLs for the piriform cortex and anterior commissure rather than the cerebral cortex. Small numbers of MGE-derived OLs persist long-term in the motor, somatosensory and piriform cortex.

      Strengths:

      The strength and novelty of the manuscript lie in the elegant tools generated and used. These have enabled the resolution of the issue regarding the contribution of different telencephalic progenitor zones to the cortical oligodendrocyte population.

      Comments on latest version:

      The revised manuscript by Cai et al has addressed all the issues raised. I have some minor comments:

      Figure 2: The y axis in figure 2L should be the same as the y axis in 2M to make the contribution to Mo and SS more clear.

      Figure 3: Although this is clear in the figure, A an B should be labelled as classical model and new model to help the reader understand immediately what the two figures show.

      Suppl Fig 2: It is not clear what 1-7 represent. It should be made clear in the legend which areas have been pooled into the different bins. The X axis should be labelled.

    3. eLife assessment

      In this study the authors revisited the question of the embryonic origin of telencephalic oligodendrocytes using some new and powerful genetic tools. There is convincing evidence to support previous suggestions of a predominantly cortical origin of oligodendrocytes in the cerebral cortex, however the new studies suggest that LGE/CGE-derived oligodendrocytes make a modest contribution in some areas, while MGE/POA-derived oligodendrocytes make a small but enduring contribution. The findings are valuable and should be of interest to developmental and myelin biologists.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review): 

      [...] Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details,which are constrained by the requirements of the Short Report format in eLife. We will addresse these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We will provide a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step and data analysis pipeline. This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We will elaborate on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion and the methods employed for identifying and characterizing subpopulations within the E. coli biofilm model.

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO number: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we will inculde additional supplementary information. This will cover extended methodology, detailed statistical analyses, and comprehensive data tables to support our findings.

      (5) Discussion of limitations: We will include a more thorough discussion of the potential limitations of our modified protocol and areas for future improvement.

      ​We believe these changes will significantly improve the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      [...] Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). <br /> There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis. This dual-domain architecture suggests a potential for complex regulatory roles. As reported, the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels, implying that the wild-type GGDEF domain in PdeI has a role in maintaining or increasing c-di-GMP levels in the cell. Additionally, PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we predict that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms. The experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. Furthermore, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Fig. 2J). HPLC LC-MS/MS analysis further confirmed that PdeI overexpression (induced by arabinose) upregulated c-di-GMP levels (Fig. 2K). Importantly, in our HPLC LC-MS/MS analysis, we compared the PdeI overexpression strain with the wild-type MG1655 strain, thereby excluding the influence of other genes in cluster 2. In summary, while PdeI is predicted to be a phosphodiesterase based on its sequence and the presence of an EAL domain, the additional presence of a divergent GGDEF domain and experimental evidence suggests that PdeI has a function in upregulating c-di-GMP levels. These findings support the hypothesis that PdeI may have both synthetic and regulatory roles in c-di-GMP metabolism.

    2. eLife assessment

      The work introduces a valuable new method for depleting the ribosomal RNA from bacterial single-cell RNA sequencing libraries and shows that this method is applicable to studying the heterogeneity in microbial biofilms. The evidence for a small subpopulation of cells at the bottom of the biofilm which upregulates PdeI expression is solid. However, more investigation into the unresolved functional relationship between PdeI and c-di-GMP levels with the help of other genes co-expressed in the same cluster would have made the conclusions more significant.

    3. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single-cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community.

      Strengths:

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq.

      Weaknesses:

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method.

    4. Reviewer #2 (Public Review):

      Summary:

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment.

      Strengths:

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected.

      Weaknesses:

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15).<br /> There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association.

    1. Author response

      Reviewer #1 (Public Review):

      […] Weaknesses:

      This work explores an interesting question on regulating myoD+ progenitors and the defects of this process in skeletal muscle differentiation by SRFS2 but spreads out in many directions rather than focusing on the key defects. A number of approaches are used, but they lack the robust mechanistic analysis of the defects that result in muscle differentiation. Specifically, the role of SRFS2 on splicing appears to be a misfit here and does not explain the primary defects in the migration of myoD+ progenitors. There are concerns about the scRNA-seq and many transcripts in muscle biology that are not expressed in muscle cells. Focusing on main defects and additional experimental evidence to clear the fusion vs. precocious differentiation vs. reduced differentiation will strengthen this work.

      (1) The analysis of RNA-seq data (Figure 2) is limited, and it is unclear how it relates to the work presented in this MS. The Go enrichment analysis is combined for both up and down-regulated DEG, thus making it difficult to understand the impact differently in both directions. Stac2 is a predominant neuronal isoform (while Stac3 is the muscle), and the Symm gene is not found in the HGNC or other databases. Could the authors provide the approved name for this gene? The premise of this work is based on defects in ECM processes resulting in the mis-targeting of the muscle progenitors to the nonmuscle regions. Which ECM proteins are differentially expressed?

      The GO enrichment analysis (Figure 2B) indicates that genes involved in skeletal muscle construction and function were significantly dysregulated, with both up-regulated and down-regulated genes observed, consistent with the phenotype analysis presented in Figure 1.

      We agree with the reviewer’s comments that Stac3 is the predominant muscle isoform with high expression in skeletal muscle tissues, while stac2 is expressed at low levels in these tissues. Therefore, we decided to delete the Stac2 data from the Figure 2C and will modify the text accordingly. We apologize for our errors.

      In response to the reviewer's comment regarding the Symm gene not being found in the HGNC or other databases, we carefully re-examined the genes presented in Figure 2C. We discovered that one of the genes is actually Synm, which encodes synemin, an intermediate filament protein. We will correct this in the manuscript.

      scRNA-seq analysis revealed defects in ECM processes in SRSF2-deficient myoblasts, which we believe likely resulted in the mis-targeting of muscle progenitors to non-muscle regions. However, comparing RNA-seq results from whole muscle tissues with scRNA-seq results is challenging.

      (2) Could authors quantify the muscle progenitors dispersed in nonmuscle regions before their differentiation? Which nonmuscle tissues MyoD+ progenitors are seen? Most of the tDT staining in the enlarged sections appears to be punctate without any nuclear staining seen in these cells (Figure 3 B, D E-F). Could authors provide high-resolution images? Also, in the diaphragm cross-sections in mutants, tdT labeling appears to be missing in some areas within the myofibers defined as cavities by the authors (marked by white arrows, Figure 3H). Could this polarized localization of tDT be contributing to specific defects?

      tdT staining revealed a substantial presence of MyoD-derived cells distributed beyond the muscle regions, as shown in Figure 3B. Quantify the number of MyoD+ progenitors dispersed in non-muscle regions is not meaningful.

      tdT+ cells also include those that previously expressed MyoD but have since differentiated into myotubes and myofibers, which is why many tdT+ staining is not nuclear.

      MyoD+ cells deficient in SRSF2 either undergo apoptosis or premature differentiation. Consequently, tdT staining in SRSF2-KO muscles showed many irregularities in the muscle fibers.

      (3) Is there a difference in the levels of tDT in the myoD" muscle progenitors that are mis-targeted vs the others that are present in the muscle tissues?

      tdT+ cells include those that previously expressed MyoD but have since differentiated into myotubes and myofibers, which are no longer MyoD+ cells. Additionally, tdT+ also include those currently expressing MyoD, which are MyoD+ cells.

      The fiber differences between WT and SRSF2-KO mice are easily discernible through tdT staining (Figure 2D and 3D), however, comparing the levels of tdT staining between the two groups is not meaningful.

      (4) scRNA is unsuitable for myotubes and myofibers due to their size exclusion from microfluidics. Could authors explain the basis for scRNA-seq vs SnRNA-seq in this work? How are SKM defined in scRNA-data in Figure 4? As the myofibers are small in KO, could the increased level of late differentiation markers be due to the enrichment of these small myotubes/myofibers in scRNA? A different approach, such as ISH/IF with the myogenic markers at E9.5-10.5, may be able to resolve if these markers are prematurely induced.

      SRSF2 is highly expressed in proliferative myoblasts, but its levels declined once differentiation begins. In our study, we used Myod1-Cre to delete the SRSF2 gene and performed the scRNA-seq analysis to examine the effects of SRSF2 deletion on the proliferation and differentiation of MyoD cells. Our analysis revealed that SRSF2 deletion caused proliferation defects and premature differentiation of MyoD cells (Figure 5G), leading to myofiber abnormalities.

      We determined that snRNA-seq analysis is not suitable for our study.

      Additionally, skeletal muscle cells (SKM) were defined based on the expression of skeletal muscle markers, as shown in Figure 4C.

      (5) TNC is a marker for tenocytes and is absent in skeletal muscle cells. The authors mentioned a downregulation of TNC in the KO SKM derived clusters. This suggests a contamination of the tenocytes in the control cells. In spite of the downregulation of multiple ECM genes showed by scRNA-seq data, the ECM staining by laminin in KO in Figure 3 appears to be similar to controls.

      Tenascin-C (Tnc) is also part of the extracellular matrix (ECM) family. scRNA-seq analysis revealed that multiple ECM genes were downregulated in SRSF2-KO myoblasts, however, this did not indicate that laminin was downregulated in the SRSF2-KO muscles.

      (6) The expression of many fusion genes, such as myomaker and myomerger, is reduced in KO, suggesting a primary fusion defect vs a primary differentiation defect. Many mature myofiber proteins exhibit an increased expression in disease states, suggesting them as a compensatory mechanism. Authors need to provide additional experimental evidence supporting precocious differentiation as the primary defect.

      Our analysis revealed that the deletion of SRSF2 caused premature differentiation of MyoD cells (Figure 5G), leading to abnormalities of myofiber formation. SRSF2 is highly expressed in proliferative myoblasts, but its expression declines quickly in myotubes. Therefore, it is unlikely that the low expression of SRSF2 in myotubes caused the primary fusion defect.

      (7) The fusion defects in KO are also evident in siRNA knockdown for SRSF2 and Aurka in C2C12, which mostly exhibits mononucleated myocytes in knockdowns. Also, a fusion index needs to be provided.

      SRSF2 knockdown and Aurka knockdown caused differentiation defects, including fusion defects. We quantified the percentages of both MyoG+ and MHC+ cells in the differentiation assay.

      (8) The last section of the role of SRSF2 on splicing appears to be a misfit in this study. Authors describe the Bin1 isoforms in centronuclear myopathy, but exon17 is not involved in myopathy. Is exon17 exclusion seen in other diseases/ splicing studies?

      Our study is the first to report that exon 17 inclusion of Bin1 is regulated by SRSF2. Specifically, the knockdown of Bin1 exon 17 caused severe differentiation defects in C2C12 myoblasts. The involvement of Bin1 exon 17 in myopathy requires further validation using clinical samples.

      Reviewer #2 (Public Review):

      […] Weaknesses: Although unbiased sequencing methods were used, their findings about SRSF2 served as a transcriptional regulator and functioned in alternative splicing events are not novel. The introductions and discussion is not clearly written. The authors did not raise clear scientific questions in the introduction part. The last paragraph is only copy-paste of the abstract. The discussion part is mainly the repeat of their results without clear discussion.

      While the role of SRSF2 as a transcriptional regulator involved in alternative splicing events is not novel, the specific SRSF2-regulated alternative splicing events and targeted genes in skeletal muscle have not been reported in other publications. We believe our interpretation of the data and comparison with related published studies are well presented in the Discussion section.

    1. eLife assessment

      This study presents valuable data on sensory integration in a model pre-motor neuron, the Mauthner cell. The authors use both stimulation of the optic tectum (a proxy for vision) and auditory stimulation to study the integration of these modalities in the Mauthner cell using convincing, technically demanding, and well done experiments. There are, however, concerns about the degree to which the two modalities interact; multisensory integration of subthreshold unisensory stimuli appears uncommon, and not significantly above events observed from single modalities. This work will be of interest to both synaptic physiologists and neurophysiologists working on sensory-motor integration.

    2. Reviewer #1 (Public Review):

      Summary:

      Otero-Coronel et al. address an important question for neuroscience - how does a premotor neuron capable of directly controlling behavior integrate multiple sources of sensory inputs to inform action selection? For this, they focused on the teleost Mauthner cell, long known to be at the core of a fast escape circuit. What is particularly interesting in this work is the naturalistic approach they took. Classically, the M-cell was characterized, both behaviorally and physiologically, using an unimodal sensory space. Here the authors make the effort (substantial!) to study the physiology of the M-cell taking into account both the visual and auditory inputs. They performed well-informed electrophysiological approaches to decipher how the M-cell integrates the information of two sensory modalities depending on the strength and temporal relation between them.

      Strengths:

      The empirical results are convincing and well-supported. The manuscript is well-written and organized. The experimental approaches and the selection of stimulus parameters are clear and informed by the bibliography. The major finding is that multisensory integration increases the certainty of environmental information in an inherently noisy environment.

      Weaknesses:

      Even though the manuscript and figures are well organised, I found myself struggling to understand key points of the figures.

      For example, in Figure 1 it is not clear what are actually the Tonic and Phasic components. The figure will benefit from more details on this matter. Then, in Figure 4 the label for the traces in panel A is needed since I was not able to pick up that they were coming from different sensory pathways.

      In line 338 it should be optic tectum and not "optical tectum".

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well-written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength and MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

      Strengths:

      The methods applied are challenging and appropriate and appear to be well executed. Open questions about the physiological underpinnings of M-cell function are addressed using sound experimental design and methodology, and convincing results are provided that advance our understanding of how two streams of sensory information can interact to control behavior.

      Weaknesses:

      Our concerns about the manuscript are captured in the following specific comments, which we hope will provide a useful perspective for readers and actionable suggestions for the authors.

      Comment 1 (Minor):

      Line 124. Direct stimulation of the tectum to drive M-cell-projecting tectal neurons not only bypasses the retina, it also bypasses intra-tectal processing and inputs to the tectum from other sources (notably the thalamus). This is not an issue with the interpretation of the results, but this description gives the (false) impression that bypassing the retina is sufficient to prevent adaptation. Adding a sentence or two to accurately reflect the complexity of the upstream circuitry (beyond the retina) would be welcome.

      Comment 2 (Major):

      The premise is that stimulation of the tectum is a proxy for a visual stimulus, but the tectum also carries the auditory, lateral line, and vestibular information. This seems like a confound in the interpretation of this preparation as a simple audio-visual paradigm. Minimally, this confound should be noted and addressed. The first heading of the Results should not refer to "visual tectal stimuli".

      Comment 3 (Major):

      Figure 1 and associated text.

      It is unclear and not mentioned in the Methods section how phasic and tonic responses were calculated. It is clear from the example traces that there is a change in tonic responses and the accumulation of subthreshold responses. Depending on how tonic responses were calculated, perhaps the authors could overlay a low-passed filtered trace and/or show calculations based on the filtered trace at each tectal train duration.

      Comment 4 (Minor):

      Figure 3 and associated text.<br /> This is a lovely experiment. Although it is not written in text, it provides logic for the next experiment in choosing a 50ms time interval. It would be great if the authors calculated the first timepoint at which the percentage of shunting inhibition is not significantly different from zero. This would provide a convincing basis for picking 50ms for the next experiment. That said, I suspect that this time point would be earlier than 50m s. This may explain and add further complexity to why the authors found mostly linear or sublinear integration, and perhaps the basis for future experiments to test different stimulus time intervals. Please move calculations to Methods.

      Comment 5 (Major):

      Figure 4C and lines 398-410.<br /> These are beautiful examples of M-cell firing, but the text suggests that they occurred rarely and nowhere close to significantly above events observed from single modalities. We do not see this as a valid result to report because there is insufficient evidence that the phenomenon shown is consistent or representative of your data.

    4. Author response:

      Answers to Reviewer #1 (Public Review):

      (1) Tonic and phasic components in Figure 1 are not clear.

      We will reformulate Figure 1A to show how the tonic and phasic components were measured. As this point was also raised by Reviewer #2 (Comment 3), we will explicitly clarify this in the Methods section. We will modify the color scheme to improve clarity.

      (2) Labeling of traces in Figure 4.

      We will add labels to traces informing which sensory pathways were stimulated to produce each response.

      (3) Optic tectum instead of optical tectum.

      We apologize for the error. We will replace “optical tectum” with “optic tectum” as also suggested by Reviewer #2.

      Answers to Reviewer #2 (Public Review):<br /> (1) Complexity of tectum upstream circuitry (Comments 1 and 2).

      Processing of visual information is certainly a major role of the tectum, but it is true that it also receives sensory inputs from other structures including sensory pathways. We will acknowledge this complexity in our revised manuscript along with suggestions for heading titles.

      (2) Figure 1 and associated text. 

      As mentioned in the provisional answer point 1 to Reviewer #1, we will reformulate Figure 1A and clarify how tonic and phasic responses were calculated.

      (3) Figure 3 and associated text.

      We will perform the analysis suggested by the reviewer and move calculations to the Methods section as requested.

      (4) Figure 5C and lines 398-410.

      We will consider omitting Figure 5C or clearly stating its value in the context of the rest of the data and our previous behavioral experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Steinemann et al. characterized the nature of stochastic signals underlying the trial-averaged responses observed in the lateral intraparietal cortex (LIP) of non-human primates (NHPs), while these performed the widely used random dot direction discrimination task. Ramp-up dynamics in the trial averaged LIP responses were reported in numerous papers before. However, the temporal dynamics of these signals at the single-trial level have been subject to debate. Using large-scale neuronal recordings with Neuropixels in NHPs, allows the authors to settle this debate rather compellingly. They show that drift-diffusion-like computations account well for the observed dynamics in LIP.

      Strengths:

      This work uses innovative technical approaches (Neuropixel recordings in behaving macaque monkeys). The authors tackle a vexing question that requires measurements of simultaneous neuronal population activity and hence leverage this advanced recording technique in a convincing way

      They use different population decoding strategies to help interpret the results.

      They also compare how decoders relying on the data-driven approach using dimensionality reduction of the full neural population space compare to decoders relying on more traditional ways to categorize neurons that are based on hypotheses about their function. Intriguingly, although the functionally identified neurons are a modest fraction of the population, decoders that only rely on this fraction achieve comparable decoding performance to those relying on the full population. Moreover, decoding weights for the full population did not allow the authors to reliably identify the functionally identified subpopulation.

      Weaknesses:

      No major weaknesses beyond a few, largely clarification issues, detailed below.

      We thank Reviewer 1 (R1) for this summary. The revised manuscript incorporates R1’s suggestions, as detailed below.

      Reviewer #2 (Public Review):

      Steinemann, Stine, and their co-authors studied the noisy accumulation of sensory evidence during perceptual decision-making using Neuropixels recordings in awake, behaving monkeys. Previous work has largely focused on describing the neural underpinnings through which sensory evidence accumulates to inform decisions, a process which on average resembles the systematic drift of a scalar decision variable toward an evidence threshold. The additional order of magnitude in recording throughput permitted by the methodology adopted in this work offers two opportunities to extend this understanding. First, larger-scale recordings allow for the study of relationships between the population activity state and behavior without averaging across trials. The authors’ observation here of covariation between the trial-to-trial fluctuations of activity and behavior (choice, reaction time) constitutes interesting new evidence for the claim that neural populations in LIP encode the behaviorally-relevant internal decision variable. Second, using Neuropixels allows the authors to sample LIP neurons with more diverse response properties (e.g. spatial RF location, motion direction selectivity), making the important question of how decision-related computations are structured in LIP amenable to study. For these reasons, the dataset collected in this study is unique and potentially quite valuable.

      However, the analyses at present do not convincingly support two of the manuscript’s key claims: (1) that ”sophisticated analyses of the full neuronal state space” and ”a simple average of Tconin neurons’ yield roughly equivalent representations of the decision variable; and (2) that direction-selective units in LIP provide the samples of instantaneous evidence that these Tconin neurons integrate. Supporting claim (1) would require results from sophisticated population analyses leveraging the full neuronal state space; however, the current analyses instead focus almost exclusively on 1D projections of the data. Supporting claim (2) convincingly would require larger samples of units overlapping the motion stimulus, as well as additional control analyses.

      We thank the reviewer (R2) for their careful reading of our paper and the many useful suggestions.

      As detailed below, the revised manuscript incorporates new control analyses, improved quantification, and statistical rigor, which now provide compelling support for key claim #1. We do not regard claim #2 as a key claim of the paper. It is an intriguing finding with solid support, worthy of dissemination and further investigation. We have clarified the writing on this matter.

      Specific shortcomings are addressed in further detail below:

      (1) The key analysis-correlation between trial-by-trial activity fluctuations and behavior, presented in Figure 5 is opaque, and would be more convincing with negative controls. To strengthen the claim that the relationship between fluctuations in (a projection of) activity and fluctuations in behavior is significant/meaningful, some evidence should be brought that this relationship is specific - e.g. do all projections of activity give rise to this relationship (or not), or what level of leverage is achieved with respect to choice/RT when the trial-by-trial correspondence with activity is broken by shuffling.

      We do not understand why R2 finds the analysis opaque, but we are grateful for the lucid recommendations. The relationships between fluctuations in neural activity and behavior are indeed “specific” in the sense that R2 uses this term. In addition to the shuffle control, which destroys both relationships (Reviewer Figure 1), we performed additional control analyses that preserve the correspondence of neural signals and behavior on the same trial. We generated random coding directions (CDs) by establishing weight vectors that were either chosen from a standard normal distribution or by permuting the weights assigned to PC-1 in each session. The latter is the more conservative measure. Projections of the neural responses onto these random coding directions render 𝑆rand(𝑡). Specifically, the degree of leverage is effectively zero or greatly reduced. These analyses are summarized in a new Supplementary Figure S10. The bottom row of Figure S10 also addresses the question, “What degree of leverage and mediation would be expected for a theoretical decision variable?” This is accomplished by simulating decision variables using the drift-diffusion model fits in Figure 1c. The simulation is consistent with the leverage and (incomplete) mediation observed for the populations of Tcon neurons. For details see Methods, §§Simulated decision variables and Leverage in

      of single-trial activity on behavior.

      (2) The choice to perform most analysis on 1D projections of population activity is not wholly appropriate for this unique type of dataset, limiting the novelty of the findings, and the interpretation of similarity between results across choices of projection appears circular:

      We disagree with the characterization of our argument as circular, but R2 raises several important points that will probably occur to other careful readers. We address them as subpoints 2.1–2.4, below. Importantly, we are neither claiming nor assuming that the LIP population activity is one-dimensional. We have revised the paper to avoid giving this impression. We are also not claiming that the average of Tin neurons (or the 1D projections) explains all features of the LIP population, nor would we expect it to, given the diversity of response fields across the population. Our objective is to identify the specific dimension within population activity that captures the decision variable (DV), which has been characterized successfully as a one-dimensional stochastic process—that is, a scalar function of time. We have endeavored to clarify our thinking on this point in the revised manuscript (e.g., lines 97–98, 103–104).

      (2.1) The bulk of the analyses (Figure 2, Figure 3, part of Figure 4, Figure 5, Figure 6) operate on one of several 1D projections of simultaneously recorded activity. Unless the embedding dimension of these datasets really does not exceed 1 (dimensionality using e.g. participation ratio in each session is not quantified), it is likely that these projections elide meaningful features of LIP population activity.

      We now report the participation ratio (4.4 ± 0.4, mean ± s.e. across sessions), and we state that the first 3 PCs explain 67.1±3.1% of the variance of time- and coherence-dependent signals used for the PCA. We agree that the 1D projections may elide meaningful features of LIP population activity. Indeed, we make this point through our analysis of the Min neurons. We do not claim that the 1D projections explain all of the meaningful features of LIP population activity. They do, however, reveal the decision variable, which is our main focus. These 1D signals contain features that correlate with events in the superior colliculus, summarized in Stine et al. (2023), attesting to their biological relevance.

      (2.2) Further, the observed similarity of results across these 1D projections may not be meaningful/interpretable. First, the rationale behind deriving Sramp was based on the ramping historically observed in Tin neurons during this task, so should be expected to resemble Tin.

      The Reviewer is correct that we would expect 𝑆ramp to resemble the ramping observed in Tin neurons. We refer to this approach as hypothesis-driven. It captures the drift component of drift-diffusion. It is true that the Tcon neurons exhibit such ramps in their trial average firing rates, but this does not guarantee in

      that the single-trial population firing rates would manifest as drift-diffusion. Indeed Latimer et al. (2015) concluded that the ramp-like averages comprise stepping from a low to a high firing rate on each trial at a random time. Therefore, while R2 is right to characterize the similarity of Tcon to the ramp direction in in trial-averaged activity as unsurprising, their similarity on single trials is not guaranteed.

      (2.3) Second, Tin comprises the largest fraction of the neuron groups sampled during most sessions, so SPC1 should resemble Tin too. The finding that decision variables derived from the whole population’s activity reduce essentially to the average of Tin neurons is thus at least in part ’baked in’ to the approach used for deriving the decision variables.

      This is incorrect. The Tcon in neurons constitute only 14.5% of the population, on average, across the sessions (see Table 1). This misunderstanding might contribute to R2’s concern about the importance of these neurons in shaping PC1. It is not simply because they are over-represented. Also, addressing R2’s concern about circularity, we would like to remind R2 that the selection of Tin neurons was based only on their spatial selectivity in the delayed saccade task. We do not see how it could be baked-in/guaranteed that a simple average of these neurons (i.e. zero degrees of freedom) yields dynamics and behavioral correlations that match those produced by dimensionality-reduction techniques that (𝑖) have degrees of freedom equal to the number of neurons and (𝑖𝑖) are blind to the neurons’ spatial selectivity. We have additionally modified what is now Supplementary Figure S13 (old Supplementary Figure S8), which portrays the mean accuracy of choice decoders trained on the neural activity of all neurons, only Tin neurons, all but the Tin neurons, and all but Tin and Min neurons, respectively. Figure S13 now highlights how much more readily choice can be decoded from the small population of Tin neurons than the remainder of the population.

      (2.4) The analysis presented in Figure S6 looks like an attempt to demonstrate that this isn’t the case, but is opaque. Are the magnitudes of weights assigned to units in Tin larger than in the other groups of units with preselected response properties? What is their mean weighting magnitude, in comparison with the mean weight magnitude assigned to other groups? What is the null level of correspondence observed between weight magnitude and assignment to Tin (e.g. a negative control, where the identities of units are scrambled)?

      The revised Figure S6—what is now Figure S9—displays more clearly that the weights assigned to Tcon and Tips neurons (purple & yellow, respectively) are larger in magnitude than those assigned in in to other neurons (gray). Author response table 1 shows a more detailed breakdown of the groups. Note that the length of the vector of weights is one. We are unsure what R2 means by “the null level of correspondence.” Perhaps it helps to know that the mean weight of the “other neurons” is close to zero for all four coding directions. However, it is the overlap of the weights and the relative abundance of non-Tin neurons that is more germane to the point we are making. To wit, knowing the weight (or percentile) of a neuron is a poor predictor that it belongs to the Tin category. This point is most clearly supported by the logistic regression (Fig. S9, bottom row). In other words, the large group of non-Tin neurons contribute substantially to all four coding directions examined in Figure S9. Thus, the similarity between Tin neurons and PC1 is not simply due to an over-representation of Tin neurons as suggested in item 2.3.

      Author response table 1.

      Mean weights assigned to neuron classes in four coding directions.

      (3) The principal components analysis normalization procedure is unclear, and potentially incorrect and misleading: Why use the chosen normalization window (±25ms around 100ms after motion stimulus onset) for standardizing activity for PCA, rather than the typical choice of mean/standard deviation of activity in the full data window? This choice would specifically squash responses for units with a strong visual response, which distorts the covariance matrix, and thus the principal components that result. This kind of departure from the standard procedure should be clearly justified: what do the principal components look like when a standard procedure is used, and why was this insufficient/incorrect/unsuitable for this setting?

      We used the early window because it is a robust measure of overall excitability, but we now use a more conventional window that spans the main epoch of our analyses, 200–600 ms after motion onset. This method yields results qualitatively similar to the original method. We are persuaded that this is the more sensible choice. We thank R2 for raising this concern.

      (4) Analysis conclusions would generally be stronger with estimates of variability and control analyses: This applies broadly to Figures 2-6.

      We have added estimates of variability and control analyses where appropriate.

      Figure 2 shows examples of single-trial signals. The variability is addressed in Figure 3a and the new Supplementary Figure S5.

      Figure 3 now contains error bars derived by bootstrapping (see Methods, §Variance and autocorrelation of smoothed diffusion signals). We have also added Supplementary Figure S5, which substantiates the sublinearity claim using simulations.

      Figure 4 (i) We now indicate the s.e.m. of decoding accuracy (across sessions) by the shading in Figure 4a. (ii) The black symbols in new Supplementary Figure S8 show the mean±s.e.m. for all pairwise comparisons shown in Figure 4d & e. (iii) Supplementary Figure S8 also summarizes two control analyses that deploy random coding directions (CDs) in neuronal state space. The upper row of Fig S9 compares the observed cosine similarity (CoSim)—between the CD identified by the graph title and the other four CDs labeled along the abscissa—with values obtained with 1000 random CDs established by random permutations of the weight assignments. The brown symbols are the mean±sdev of the CoSim (N=1000). The error bars are smaller than the symbols. We use the cumulative distribution of CoSim under permutation to estimate p-values (p<0.001 for all comparisons). We used a similar approach to estimate the distribution of the analogous correlation statistics between signals rendered by random directions in state space (Figure S8, lower row). For additional details, please see Methods, §Similarity of single-trial signals.

      Figure 5: The rigor of all claims associated with this figure is adduced from two control analyses and a simulation. The first control breaks the trial-by-trial correspondence between neural signals and behavior (Reviewer Figure 1). The second control shows that neural activity does not have substantial leverage on behavior when projected onto random directions in state space (Supplementary Figure S10, top). Simulations of decision variables using parameters derived from the fits to the behavioral data (Figure 1) support a degree of leverage and mediation comparable to the values observed for 𝑆Tincon (Supplementary Figure S10, bottom). For additional details, please see Methods (§Leverage of single-trial activity on behavior) and the reply to item 1, above.

      Figure 6: Panels c&d show estimates of variability across neurons and experimental sessions, respectively. The reported p-value is based on a permutation test (see Methods, §Correlations between Min and Tconin ). The correlations shown in panel e (heatmap) are derived from pooled data across sessions. The reported p-value is based on a permutation test (see Methods, §Correlations between Min and Tconin ).

      Reviewer #3 (Public Review):

      Summary:

      The paper investigates which aspects of neural activity in LIP of the macaque give rise to individual decisions

      (specificity of choice and reaction times) in single trials, by recording simultaneously from hundreds of neurons. Using a variety of dimensionality reduction and decoding techniques, they demonstrate that a population-based drift-diffusion signal, which relies on a small subset of neurons that overlap choice targets, is responsible for the choice and reaction time variability. Analysis of direction-selective neurons in LIP and their correlation with decision-related neurons (T con in [Tconin ] neurons ) suggests that evidence integration occurs within area LIP.

      Strengths:

      This is an important and interesting paper, which resolves conflicting hypotheses regarding the mechanisms that underlie decision-making in single trials. This is made possible by exploiting novel technology (Primatepixels recordings), in conjunction with state-of-the-art analyses and well-established dynamic random dot motion discrimination tasks.

      General recommendations

      (1) Please tone down causal language. You presentcompelling correlativeevidencefor the idea thatLIP population activity encodes the drift-diffusion DV. We feel that claims beyond that (e.g., ”Single-trial drift-diffusion signals control the choice and decision time”) would require direct interventions, and are only partially supported by the current evidence. Further examples are provided in point 1) of Reviewer 1 below.

      We have adopted the recommendation to “tone down the causal language.” Throughout the manuscript, we strive to avoid conveying the false impression that the present findings provide causal support for the decision mechanism. However, other causal studies of LIP support causality in the random dot motion task (Hanks et al., 2006; Jeurissen et al., 2022). It is therefore justifiable to use terms that imply causality in statements intended to convey hypotheses about mechanism. We agree that we should not give the false impression that the present support for said mechanism is adduced from causal perturbations in this study, as there were none.

      (2) Please provide a commonly used, data-driven quantification of the dimensionality of the population activity – for example, using participation ratio or the number of PCs explaining 90 % of the variance. This will help readers evaluate the conclusions about the dimensionality of the data.

      Principal component analysis reveals a participation ratio of 4.4 ± 0.4 (mean ±s.e., across sessions), and the first 3 PCs explain 67.1 ± 3.1 percent of the variance. The dimensionality of the data is low, but greater than one. We state this in Methods (§Principal Component Analysis) and in Results (§Single-trial drift-diffusion signals approximate the decision variable, lines 200–201).

      (3) Please justify the normalization procedure used for PCA: Why use the chosen normalization window (±25ms around 100ms after motion stimulus onset) for standardizing activity for PCA, rather than the more common quantification of mean/standard deviation across the full data window? What do the first principal components look like when the latter procedure is used?

      We now use a more conventional window that spans the main epoch of our analyses, 200–600 ms after motion onset. This method yields results qualitatively similar to the original method. We are persuaded that this is the more sensible choice.

      (4) Please provide estimates of variability for variance and autocorrelation in Fig. 3 (e.g., through bootstrapping). Further, simulations could substantiate the claim about the expected sub-linearity at later time points (Fig. 3a) due to the upper stopping bound and limited firing rate range.

      We thank the reviewers for these helpful recommendations. The revised Fig. 3 now contains error bars derived by bootstrapping (see Methods, §Variance and autocorrelation of smoothed diffusion signals). We have also added Supplementary Figure S5, which substantiates the sub-linearity claim using simulations.

      (5) Please add controls and estimates of variability for decoding across sessions in Fig. 4: what are the levels of within-trial correlation/cosine similarity for random coding directions? What is the variability in the estimates of values shown in a/d/e?

      We have addressed each of these items. (1) Figure 4a now shows the s.e.m. of decoding accuracy (across sessions). (2) Regarding the variability of estimates shown in Figure 4d & e, the standard errors are displayed in the new supplementary Figure S8. It makes sense to show them there because there is no natural way to represent error on the heat maps in Figure 4, and Figure S8 concerns the comparison of the values in Figure 4d&e to values derived from random coding directions. (3) Random coding directions lead to values of cosine similarity and within-trial correlation that do not differ significantly from zero. We show this in several ways, summarized in our reply to Public Review item 4. Additional details are in the revised manuscript (Methods, §Similarity of single-trial signals) and the new Supplementary Figure S8.

      (6) Please perform additional analysis to strengthen the claim from Fig. 6, that Min represents the integrand and not the integral. The analysis in Fig. 6d could be repeated with the integral (cumulative sum) of the single-trial Min signals. Does this yield an increase in leverage over time?

      The short answer is, yes in part. Reviewer Figure 2a provides support for leverage of the integral on choice, and this leverage, like 𝑆Tincon (t), increases as a function of time. The effect is present in all seven sessions that have both Mleftin and Mrightin neurons (all 𝑝 < 1𝑒 − 10). However, as shown in panel b, the same integral fails to demonstrate more than a hint of leverage on RT. All correlations are barely negative, and the magnitude does not increase as a function of time. We suspect—but cannot prove—that this failure arises because of limited power and the expected weak effect. Recall that the mediation analysis of RT is restricted to longer trials. Moreover, the correlation between the Min difference and the Tin signal is less than 0.1 (heatmap, Fig. 6e), implying that the Min difference explains less than 1% of the variance of 𝑆Tin(𝑡). We considered including Reviewer Figure 2 in the paper, but we feel it would be disingenuous (cherry-picking) to report only the positive outcome of the leverage on choice. If the editors feel strongly about it, we would be open to including it, but leaving these analyses out of the revised manuscript seems more consistent with our effort to deëmphasize this finding. In the future, we plan to record simultaneously from populations MT and LIP neurons (Min and Tin, of course) and optimize Min neuron yield by placing the RDM stimulus in the periphery.

      (7) Please describe the complete procedure for determining spatially-selective activity. E.g.: What response epoch was used, what was the spatial layout of the response targets, were responses to all ipsi- vs contralateral targets pooled, what was the spatial distribution of response fields relative to the choice targets across the population?

      We thank the reviewers for pointing out this oversight. We now explain this procedure in the Methods (lines 629–644):

      Neurons were classified post hoc as Tin by visual-inspection of spatial heatmaps of neural activity acquired in the delayed saccade task. We inspected activity in the visual, delay, and perisaccadic epochs of the task. The distribution of target locations was guided by the spatial selectivity of simultaneously recorded neurons in the superior colliculus (see Stine 2023 for details). Briefly, after identifying the location of the SC response fields, we randomly presented saccade targets within this location and seven other, equally spaced locations at the same eccentricity. In monkey J we also included 1–3 additional eccentricities, spanning 5–16 degrees. Neurons were classified as Tin if they displayed a clear, spatially-selective response in at least one epoch to one of the two locations occupied by the choice targets in the main task. Neurons that switched their spatial selectivity in different epochs were not classified as Tin. The classification was conducted before the analyses of activity in the motion discrimination task. The procedure was meant to mimic those used in earlier single-neuron studies of LIP (e.g., Roitman & Shadlen 2002) in which the location of the choice targets was

      determined online by the qualitative spatial selectivity of the neuron under study. The Tcon neurons in the in present study were highly selective for either the contralateral or ipislateral choice target used in the RDM task (AUC = 0.89±0.01; 𝑝 < 0.05 for 97% of neurons, Wilcoxon rank sum test). Given the sparse sampling of saccade target locations, we are unable to supply a quantitative estimate of the center and spatial extent of the RFs.

      (8) Please clarify if a neuron could be classified as both Tin and Min. Or were these categories mutually exclusive?

      These categories are mutually exclusive. If a neuron has spatially-selective persistent activity, as defined by the method described above, it is classified as a Tin neuron and not as an Min neuron even if it also shows motion-selective activity during passive motion viewing. We now specify this in the Methods (lines 831–832).

      Reviewer∗ 1, private recommendations

      𝑅∗1.1a Causal language (Line 23-24): “population activity represents […] drift” and “we provide direct support for the hypothesis that drift-diffusion signal is the quantity responsible for the variability in choice and RT” reads at first sight as if the authors claim that they present evidence for a causal effect of LIP activity on choice. The authors areotherwisenuanced and carefultopointout thattheir evidence is correlational. What seems to be meant is that the population activity/drift-diffusion signal ”approximates the DV that gives rise to the choices […]” (cf. line 399). I would recommend using such alternative phrasing to avoid confusion (and the typically strong reactions by readers against misleading causal statements).

      We have adopted the reviewer’s recommendation and have modified the text throughout to reduce causal language. See our response to General Recommendation 1.

      𝑅∗1.1b Relatedly, any discussion about the possibility of LIP being causally involved in evidence integration (e.g. lines 429-445 [Au: now 462–478]) should also comment on the possibility of a distributed representation of the decision variable given that neural correlates of the DV have been reported in several areas including PFC, caudate and FEF.

      We believe this is possible. However, we hope to avoid discussions about causality given that it is not a focus of the paper. Although it is somewhat tangential, we have shown elsewhere that LIP is causal in the sense that causal manipulations affect behavior, but it is also true that causality does not imply necessity, and similarly, lack of necessity does not imply “only correlation.” Regarding distributed representations, it is worth keeping in mind the cautionary counter-example furnished by the SC study (Stine et al., 2023). The firing rates measured by averaging over trials are similar in SC and LIP; both manifest as coherence and direction-dependent ramps, leading to the suggestion that they form a distributed representation of the decision variable. With single-trial resolution, we now know that LIP and SC exhibit distinct dynamics—drift-diffusion and bursting, respectively. It remains to be seen if single-trial resolution achievable by simultaneous Neuropixels recordings from prefrontal areas and LIP reveal shared or distinct dynamics.

      𝑅∗1.2 How was the spatially selective activity determined? The classification of Tin neurons is critical to this study - how was their spatial selectivity determined? Please describe this in similar detail as the description of direction selectivity on lines 681-690 [Au: now 824–832]. E.g.: what response epoch was used, what was the spatial layout of the response targets, were responses to all ipsi- vs contralateral targets pooled, and what was the spatial distribution of response fields relative to the choice targets across the population?

      We now explain the selection procedure in Methods (lines 629–644). Please see our reply to General Recommendation 7, above.

      𝑅∗1.3 Could a neuron be classified as both Tin and Min, or were these categories mutually exclusive? Please clarify. (This goes beyond the scope of the current study: but did the authors find evidence for topographic organization or clustering of these categories of neurons?)

      These categories are mutually exclusive. Please see our response to General Recommendation 8, above.

      𝑅∗1.4 Contrary to the statement on line 121, the trial averages in Fig. 2a, 2b show coherence dependency at the time of the saccade in saccade-aligned traces for the coding strategies, except for STin (fig. 2c). Is this a result of the choice for t1 (= 0.1s)? (The authors may want to change their statement on line 121.) Relatedly, do the population responses for the two coding strategies Sramp and SPC1 depend on the epoch used to derive weights for individual neurons?

      We have revised the description to accommodate R2’s observation. 𝑆ramp retains weak coherence-dependence before saccades towards the choice target contralateral to the recording site. This was true in four of the eight sessions. For 𝑆PC1, there is no longer a coherence dependency for the Tin choices, owing to the change in normalization method (see revised Figure 2b).

      We also corrected an error in the Methods section. Specifically, the ramp ends at 𝑡1 \= 0.05 s before the time of the saccade, not 𝑡1 \= 0.1 s. While we no longer emphasize the similarity of traces aligned to saccade, it is reasonable to find issue with the observation that they retain a dependency on coherence (𝑆ramp only) because, according to theory, traces associated with Tin choices should reach a common positive threshold at decision termination. That said, for the Ramp direction there may be a reason to expect this discrepancy from theory. The deterministic part of drift-diffusion includes an urgency signal that confers positive convexity to the deterministic drift. This accelerating nonlinearity is not captured by the ramp, and it is more prominent at longer decision times, thus low coherences. We do not share this interpretation in the revised manuscript, in part because retention of coherence dependency is present in only half the sessions (see Reviewer Figure

      (3) The correction to the definition of 𝑡1 also provides an opportunity to address R2’s final question (“Relatedly,…?”). For 𝑆ramp this particular variation in 𝑡1 does not affect 𝑆ramp, and 𝑆PC1 no longer retains coherence dependency for Tin choices. Note that our choice of 𝑡0 and 𝑡1 is based on the empirical observation that the ramping activity in response averages of Tin neurons typically begins 200 ms after motion onset and ends 50–100 ms before initiation of the saccadic choice. The starting time (𝑡0) is also supported by the observation that the decoding accuracy of a choice-decoder begins to diverge from chance at this time (Figure 4a).

      𝑅∗1.5 It is intriguing that Sramp and SPC1 show dynamics that look so similar (fig. 2a, 2b). How do the weights assigned to each neuron in both strategies compare across the population?

      The weights assigned to each neuron are very similar across the two strategies as indicated by a cosine similarity (0.65 ± 0.04, mean ±s.e.m. across sessions).

      𝑅∗1.6 Tin neurons, which show dynamics closely resembling different coding directions (fig. 2) and the decoders do not have weights that can distinguish them from the rest of the population in each of these analyses (fig. S7). Is it fair to interpret these findings as evidence for broad decision-related co-variability in the recorded neural population in LIP?

      Yes, our results are consistent with this interpretation. However, it is worth reiterating that decoding performance drops considerably when Tin neurons are not included (see Supplementary Figure S13). Thus, this broad decision-related co-variability is present but weak.

      𝑅∗1.7 It is intriguing that the decoding weights of the different decoders did not allow the authors to reliably identify Tin neurons. Could this be, in part, due to the low dimensionality of the population activity and task that the animals are presumably overtrained on? Or do the authors expect this finding to hold up if the population activity and task were higher dimensional?

      Great question! We can only speculate, but it seems possible that a more complex, “higher dimensional” task could make it easier to identify Tin neurons. For example, a task with four choices instead of two may decrease correlations among groups of neurons with different response fields. We have added this caveat to the discussion (lines 459-–461). One minor semantic objection: The animal has learned to perform a highly contrived task at low signal-to-noise. The animal is well-trained, not over-trained.

      𝑅∗1.8 Lines 135-137 [Au: now 141–142]: The similarity in the single trial traces from different coding strategies (fig. 2a-2c, left) is not as evident to me as the authors suggest. It might be worthwhile computing the correlation coefficients between individual traces for each pair of strategies and reporting the mean correlation to support the author’s point.

      We report the mean correlation between single-trial signals generated by the chosen dimensionality reduction methods in Figure 4e. We show the variability in this measure in Supplementary Figure S8. We have also adjusted the opacity of the single-trial traces in Figure 2, left.

      𝑅∗1.9 Minor/typos:

      -line 74: consider additionally citing Hyafil et al. 2023.

      -line 588: ”that were strongly correlated”?

      -line 615: ”were the actual drift-diffusion process were...”.

      -line 717: ”a causal influence” -> ”no causal influence”.

      Fig. 6: panel labels e vs d are swapped between the figure and caption.

      Fig. 3c: labels r1,3 & r2,3 are flipped.

      We have addressed all of these items. Thank you.

      Reviewer 2∗, private recommendations

      𝑅∗2.1 (Figure 2) Determine whether restricting the analysis to 1D projections of the data is a suitable approach given the actual dimensionality of the datasets being analyzed:

      - Should show some quantification of the dimensionality of the recorded activity; could do this by quantifying the dimensionality of population activity in each session, e.g. with participation ratio or related measures (like # PCs to explain some high proportion of the variance, e.g. 90 %). If much of the variation is not described in 1 dimension, then the paper would benefit from some discussion/analysis of the signals that occupy the other dimensions.

      We now report the participation ratio (4.4 ± 0.4, mean ±s.e. across sessions), and we state that the first 3 PCs explain 67.1 ± 3.1% of the variance of the time- and coherence-dependent signals used for the PCA (mean ±s.e). We agree that the 1D projections may elide meaningful features of LIP population activity. Indeed, we make this point through our analysis of the Min neurons. To reiterate our response above, we do not claim that the 1D projections explain all of the meaningful features of LIP population activity. They do, however, reveal the decision variable, which is our main focus. These 1D signals contain features that correlate with events in the superior colliculus, summarized in Stine et al. (2023), attesting to their biological relevance.

      The Reviewer is correct that our approach presupposes a linear embedding of the 1D decision variable inthepopulationactivity. Inotherwords, anonlinearrepresentationofthe1Ddecisionvariableinpopulation activity could have an embedding dimensionality greater than 1, and there may well be a non-linear method that reveals this representation. To test this possibility, we decoded choice on each trial from population activity using (1) a linear decoder (logistic classifier) or (2) a multi-layer neural network, which can exploit non-linearities. We found that, for each session, the two decoders performed similarly: the neural network outperforms the logistic decoder (barely) in just one session. The analysis suggests that the assumption of linear embedding of the decision variable is justified. We hope this analysis convinces the reviewer that “sophisticated analyses of the full neuronal state space” and “a simple average of [Tcon ] neurons” do in indeed yield roughly equivalent representations of the decision variable. We have included the results of this analysis in Supplementary Figure S12. See also item 2 of the Public response.

      𝑅∗2.2 (Figure 3) Add estimates of variability for variance and autocorrelation through time from single-trial signals:

      –   E.g. by bootstrapping. Would be helpful for making rigorous the discussion of when the deviation from the theory is outside what would be expected by chance, even if it doesn’t change the specific conclusions here.

      –   If possible, it would help (by simulations, or maybe an added reference if it exists) to substantiate the claim about the expected sub-linearity at later time-points (Figure 3a) due to the upper stopping bound and limited firing rate range.

      We thank the reviewer for this helpful comment. The revised Fig. 3 now contains error bars derived by bootstrapping (see Methods, §Variance and autocorrelation of smoothed diffusion signals). We have also added Supplementary Figure S5, which substantiates the sub-linearity claim using simulations.

      𝑅∗2.3 (Figure 4) Add controls and estimates of variability for decoding across sessions:

      –   As a baseline - what is the level of within-trial correlation/cosine similarity when random coding directions are used?

      –   What is the variability in the estimates of values shown in a/d/e?

      We have addressed each of these items. (1) Figure 4a now shows the s.e.m. of decoding accuracy (across sessions). (2) Regarding the variability of estimates shown in Figure 4d & e, the standard errors are displayed in the new Supplementary Figure S8. It makes sense to show them there because (i) there is no natural way to represent error on the heat maps in Figure 4, and (ii) S8 concerns the comparison of the values in Figure 4d & e to values derived from random coding directions. (3) Random coding directions lead to values of cosine similarity and within-trial correlation that do not differ significantly from zero. We show this in several ways, summarized in our reply to Public Review item 4. Additional details are in the revised manuscript (Methods: Similarity of single-trial signals) and the new Supplementary Figure S8. We also provide this information in response to Recommendation 5, above.

      𝑅∗2.4 (Figure 5) Add negative controls and significance tests to support claims about trends in leverage:

      –   What is the level of increase in leverage attained from random 1D projections of the data, or other projections where the prior would be no leverage?

      –   What is the range of leverage values fit for a simulated signal with a ground-truth of no trend?

      We have added two control analyses. In addition to a shuffle control, which destroys the relationship (Review Figure 1) we performed additional analyses that preserve the correspondence of neural signals and behavior on the same trial. We generated random coding directions (CDs) by establishing weight-vectors that were either chosen from a Normal distribution or by permuting the weights assigned to PC-1 in each session. The latter is the more conservative measure. Projections of the neural responses onto these random coding directions render 𝑆rand(𝑡). Specifically, the degree of leverage is effectively zero or very much reduced. These analyses are summarized in a new Supplementary Figure S10. The distributions of our test statistics (e.g., leverage on choice and RT) under the variants of the null hypothesis also support traditional metrics of statistical significance. Figure S10 (bottom row) also provides an approximate answer to the question: What degree of leverage and mediation would be expected for a theoretical decision variable? Briefly, we simulated 60,000 trials using the race model that best fits the behavioral data of monkey M. For any noise-free representation of a Markovian integration process, the leverage of an early sample of the DV on behavior would be mediated completely by later activity as the latter sample—up to the time of commitment—subsumes all variability captured by the earlier sample. We, therefore, generated 𝑆sim(𝑡) by first subsampling the simulated data to match the trial numbers of each session. To evaluate a DV approximated from the activity of 𝑁 Tconin neurons per session rather than the true DV represented by the entire population, we generated 𝑁 noisy instantiations of the signal for each of the subsampled, simulated trials. The noisy decision variable, 𝑆sim (t) is the mean activity of these 𝑁 noise-corrupted signals. The simulation is consistent with the leverage and incomplete mediation observed for the populations of Tcon neurons. For in additional details, see Methods, §Leverage of single-trial activity on behavior) and Supplementary Figure S10, caption. See also our response to item 1 of the Public Response.

      𝑅∗2.5 The analysis is performed across several signed coherence levels, with data detrended for each signed coherence and choice to enable comparison of fluctuations relative to the relevant baseline; are results similar for the different coherences?

      The results are qualitatively similar for individual coherences. There is less power, of course, because there are fewer trials. The analyses cannot be performed for coherences ≥ 12.8% because there are not enough trials that satisfy the inclusion criteria (presence of left and right choice trials with RT ≤ 670 ms). Nonetheless, leverage on choice and RT is statistically significant for 27 of the 30 combinations of motion strengths < 12.8% × three signals (𝑆ramp, 𝑆PC1 and 𝑆Tin) × behavioral measures (RT and choice) (RT: all 𝑝 < 0.008, Fisher-z; choice: all 𝑝 < 0.05, t-test ). The three exceptions are trials with 6.4% coherence rightward motion, which do not correlate significantly with RT on leftward choice trials. Reviewer Figure 4 shows the results of the leverage and mediation analyses, using only the 0% coherence trials.

      𝑅∗2.6 (Figure 6) Additional analysis to strengthen the claim that Min represents the integrand and not the integral:

      a. Repeating the analysis in Figure 6d with the integral (cumulative sum) of the single-trial Min signals and instead observing a significant increase in leverage over time would be strong evidence for this interpretation. If you again see no increase, then it suggests that the activity of these units (while direction selective) may not be strongly yoked to behavior. This scenario (no increasing leverage of the integral of Min on behavior through time) also raises an intriguing alternative possibility: that the noise driving the ’diffusion’ of drift-diffusion here may originate in the integrating circuit, rather than just reflecting the complete integration of noise in the stream of evidence itself.

      b. Repeating the analysis in Figure 6d with the projection of the M subspace onto its own first PC (e.g. take the union of units {Mrightin, Mleftin} [our ], do PCA just on those units’ single

      trial activities, identify the first PC, and project those activities on that dimension to obtain SPC1-M.

      c. Ameliorating the sample-size limitation by relaxing the criteria for inclusion in Min - performing the same analyses shown, but including all units with visual RFs overlapping the motion stimulus, irrespective of their direction selectivity.

      a. Reviewer Figure 2a provides support for leverage of the integral on choice, and this leverage, like , increases as a function of time. The effect is present in all seven sessions that have both and neurons (all 𝑝 < 1𝑒 − 10). However, as shown in panel b, the same integral fails

      to demonstrate more than a hint of leverage on RT (all correlations are negative) and the magnitude does not vary as a function of time. We suspect—but cannot prove—that this failure arises because of limited power and the expected weak effect. Recall that the mediation analysis of RT is restricted to longer trials and that the correlation between the Min difference and the signal is less than 0.1 over the heatmap in Fig. 6e, implying that the Min difference explains less than 1% of the variance of 𝑆Tin(𝑡). We considered including Reviewer Figure 2 in the paper, but we feel it would be disingenuous (cherrypicking) to report only the positive outcome of the leverage on choice. If the editors feel strongly about it, we would be open to including it, but leaving these analyses out of the revised manuscript seems more consistent with our effort to deëmphasize this finding. In the future, we plan to record simultaneously from populations MT and LIP neurons (Min and Tin, of course) and optimize Min neuron yield by placing the RDM stimulus in the periphery. We also provide this information in response to Recommendation (6) above.

      b.  We tried the R’s suggestion to apply PCA to the union of Min neurons , , fully expecting PC1 to comprise weights of opposite sign for the right and left preferring neurons, but that is not what we observed. Instead, the direction selectivity is distributed over at least two PCs. We think this is a reflection of the prominence of other signals, such as the strong visual response and normalization signals (see Shushruth et al., 2018). In the spirit of the R’s suggestion, we also established an “evidence coding direction” using a regression strategy similar to the Ramp CD applied to the union of Min neurons. The strategy produced a coding direction with opposite signed weights dominating the right and left subsets. The projection of the neural data on this evidence CD yields a signal similar to the difference variable used in Fig. 6e (i.e., signals that are approximately constant firing rates vs time and scale as a function of signed coherence). These unintegrated signals exhibit weak leverage on choice and RT, consistent with Figure 6d. However, the integrated signal has leverage on choice but not RT, similar to the integral of the difference signal in Reviewer Figure 2.

      c.   We do not understand the motivation for this analysis. We could apply PCA or dPCA (or the regression approach, described above) to the population of units with RFs that overlap the motion stimulus, but it is hard to see how this would test the hypothesis that direction-selective neurons similar to those in area MT supply the momentary evidence. As mentioned, we have very few Min neurons (as few as two in session 3). Future experiments that place the motion stimulus in the periphery would likely increase the yield of Min neurons and would be better suited to study this question. As such, we do not see the integrand-like responses of Min neurons as a major claim of the paper. Instead, we view it as an intriguing observation that deserves follow-up in future experiments, including simultaneous recordings from populations of MT and LIP neurons (Min and Tin, of course). We have softened the language considerably to make it clear that future work will be needed to make strong claims about the nature of Min neurons.

      𝑅∗2.7 Other questions: Figure 2c is described as showing the average firing rate of units in Tconin on single trials, but must also incorporate some baseline subtraction (as the shown traces dip into negative firing rates). Whatbaselineissubtracted? Aretheseresidualsignals, asdescribedforlaterfigures, orisadifferent method used? (Presumably, a similar procedure is used also for Figure 2a/b, given that all single-trial traces begin at 0.). Is the baseline subtraction justified? If the dataset really does reflect the decision variable with single-trial resolution, eliminating the baseline subtraction when visualizing single-trial activity might actually help to make the point clearer: trials which (for any reason) begin with a higher projection on the particular direction that furnishes the DV would be predicted to reach the decision bound, at any fixed coherence, more quickly than trials with a smaller projection onto this direction.

      We thank the reviewer for this comment. For each trial, the mean activity between 175 ms and

      225 ms after motion onset was subtracted when generating the single-trial traces. The baseline subtraction was only applied for visualization to better portray the diffusion component in the signal. Unless otherwise indicated, all analyses are computed on non-baseline corrected data. We now describe in the caption of Figure 2 that “For visualization, single-trial traces were baseline corrected by subtracting the activity in a 50 ms window around 200 ms.” Examples of the raw traces used for all follow-up analyses are displayed in Reviewer Figure 6.

      Reviewer 3∗, private recommendations

      I only have a few comments to make the paper more accessible:

      𝑅∗3.1 I struggle to understand how the linear fitting from -1 to 1 was done. More detail about how the single cell single-trial activity was generated to possibly go from -1 to 1 or do I completely misunderstand the approach? I assume the data standardization does that job?

      We have rephrased and added clarifying detail to the section describing the derivation of the ramp signal in the Methods (Ramp direction).

      We applied linear regression to generate a signal that best approximates a linear ramp, on each trial, 𝑖, that terminates with a saccade to the choice-target contralateral to the hemisphere of the LIP recordings. The ramps are defined in the epoch spanning the decision time: each ramp begins at 𝑓𝑖(𝑡0) = −1, where 𝑡0 \= 0.2 s after motion onset, and ends at 𝑓𝑖(𝑡1) = 1, where 𝑡1 \= 𝑡sac − 0.05 s (i.e., 50 ms before saccade initiation). The ramps are sampled every 25 ms and concatenated using all eligible trials to construct a long saw-tooth function (see Supplementary Figure S2). The regression solves for the weights assigned to each neuron such that the weighted sum of the activity of all neurons best approximates the saw-tooth. We constructed a time series of standardized neural activity, sampled identically to the saw-tooth. The spike times from each neuron are represented as delta functions (rasters) and convolved with a non-causal 25 ms boxcar filter. The mean and standard deviation of all sampled values of activity were used to standardize the activity for each neuron (i.e., Z-transform). The coefficients derived by the regression establish the vector of weights that define 𝑆ramp. The algorithm ensures that the population signal 𝑆ramp(𝑡), but not necessarily individual neurons, have amplitudes ranging from approximately −1 to 1.

      𝑅∗3.2 It is difficult to understand how the urgency signal is derived, to then generate fig S4.

      The urgency signal is estimated by averaging 𝑆𝑥(𝑡) at each time point relative to motion onset, using only the 0% coherence trials. We have clarified this in the caption of Supplementary Figure S4.

      Author response image 1.

      Shuffle control for Fig. 5. Breaking the within-trial correspondence between neural signal, 𝑆(𝑡), and choice suppresses leverage to near zero.

      Author response image 2.

      Leverage of the integrated difference signal on choice and RT. Traces are the average leverage across seven sessions. Same conventions as in Figure 5.

      Author response image 3.

      Trial-averaged 𝑆ramp activity during individual sessions. Same as ?? for individual sessions for Monkey M (left) and Monkey J (right). The figure is intended to illustrate the consistency and heterogeneity of the averaged signals. For example, the saccade-aligned averages lose their association with motion strength before left (contra) choices in sessions 1, 2, 5, and 6 but retain the association in sessions 3, 4, 7, and 8.

      Author response image 4.

      Drift-diffusion signals have measurable leverage on choice and RT even when only 0%-coherence trials are included in the analysis.

      Author response image 5.

      Raw single-trial activity for three types of population averages. Representative single-trial activity during the first 300 ms of evidence accumulation using two motion strengths: 0% and 25.6% coherence toward the left (contralateral) choice target. Unlike in Figure 2 in the paper, single-trial traces are not baseline corrected by subtracting the activity in a 50 ms window around 200 ms. We highlight a number of trials with thick traces and these are the same trials in each of the rows.

    2. eLife assessment

      This fundamental work quantifies the stochastic dynamics of neural population activity in the lateral intraparietal area (LIP) of the macaque monkey brain during single perceptual decisions. These single-trial dynamics have been subject to intense debate in neuroscience, and they have significant implications for modelling decision-making in various fields including neuroscience and psychology. Through a combination of state-of-the-art recordings from many LIP neurons and theory-driven data analyses, the authors provide convincing evidence for the notion that single-trial neural population dynamics in LIP encode the decision variable postulated by the drift-diffusion model of decision-making.

    3. Reviewer #1 (Public Review):

      Summary:

      In this paper, Steinemann et al. characterized the nature of stochastic signals underlying the trial-averaged responses observed in lateral intraparietal cortex (LIP) of non-human primates (NHPs), while these performed the widely used random dot direction discrimination task. Ramp-up dynamics in the trial averaged LIP responses were reported in numerous papers before. But the temporal dynamics of these signals at the single-trial level have been subject to debate. Using large scale neuronal recordings with Neuropixels in NHPs, allows the authors to settle this debate rather compellingly. They show that drift-diffusion like computations account well for the observed dynamics in LIP.

      Strengths:

      This work uses innovative technical approaches (Neuropixel recordings in behaving macaque monkeys). The authors tackle a vexing question that requires measurements of simultaneous neuronal population activity and hence leverage this advanced recording technique in a convincing way.

      They use different population decoding strategies to help interpret the results.

      They also compare how decoders relying on the data-driven approach using dimensionality reduction of the full neural population space compares to decoders relying on more traditional ways to categorize neurons that are based on hypotheses about their function. Intriguingly, although the functionally identified neurons are a modest fraction of the population, decoders that only rely on this fraction achieve comparable decoding performance to those relying on the full population. Moreover, decoding weights for the full population did not allow the authors to reliably identify the functionally identified subpopulation.

      The revision addressed the minor weaknesses to our satisfaction.

    4. Reviewer #2 (Public Review):

      Steinemann, Stine, and their co-authors studied the noisy accumulation of sensory evidence during perceptual decision-making using Neuropixels recordings in awake, behaving monkeys. Previous work has largely focused on describing the neural underpinnings through which sensory evidence accumulates to inform decisions, a process which on average resembles the systematic drift of a scalar decision variable toward an evidence threshold. The additional order of magnitude in recording throughput permitted by the methodology adopted in this work offers two opportunities to extend this understanding. First, larger-scale recordings allow for the study of relationships between the population activity state and behavior without averaging across trials. The authors' observation here of covariation between the trial-to-trial fluctuations of activity and behavior (choice, reaction time) constitutes interesting new evidence for the claim that neural populations in LIP encode the behaviorally-relevant internal decision variable. Second, using Neuropixels allows the authors to sample LIP neurons with more diverse response properties (e.g. spatial RF location, motion direction selectivity), making the important question of how decision-related computations are structured in LIP amenable to study. For these reasons, the dataset collected in this study is unique and potentially quite valuable. This revised manuscript addresses a number of questions regarding analyses which were unclear in the original manuscript, and as a result the study is a strong contribution toward our understanding of neural mechanisms of decision making.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors explore mechanisms through which T-regs attenuate acute pain using a heat sensitivity paradigm. Analysis of available transcriptomic data revealed expression on the proenkephalin (Penk) gene in T-regs. The authors explore the contribution of T-reg Penk in the resolution of heat sensitivity.

      Strengths:

      Investigating the potential role of T-reg Penk in the resolution of acute pain is a strength.

      Weaknesses:

      The overall experimental design is superficial and lacks sufficient rigor to draw any meaningful conclusions.

      We hope that the reviewer will reconsider this severe criticism after examining the updated manuscript and results.

      For instance:

      (1) The were no TAM controls. What is the evidence that TAM does not alter heat-sensitive receptors.

      the impact of TMX on heat perception is not the object of this study. Nevertheless, it appears that heat-sensitivity in controls WT (blue dots) is slightly diminished after TMX administration (Figure 5A), suggesting that heat-sensitive receptors are moderately altered by TMX per se. This reduction is much more pronounced for LOX mice. Thus, although it is possible that TMX play a marginal role on heat sensitivity by itself, the results show a much more pronounced effect of TMX in LOX than in WT, in favor of a role for Penk Treg in heat sensitivity.

      (2) There are no controls demonstrating that recombination actually occurred. How do the authors know a single dose of TAM is sufficient?

      these results are now presented in figure S4. A 70% reduction in Penk mRNA is observed in Treg after a single administration of TMX.

      (3) Why was only heat sensitivity assessed? The behavioral tests are inadequate to derive any meaningful conclusions. Further, why wasn't the behavioral data plotted longitudinally

      The longitudinal data are presented in figure S5A. New behavioral tests have been performed and the results are now shown in figure S5E-H. Importantly, heat sensitivity was observed in two independent laboratory with two different tests.

      Reviewer #2 (Public Review):

      Summary:

      The present study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Strengths:

      The manuscript is clear and reveals a previously unappreciated role of enkephalins, as released by immune cells, in sensory perception. The rationale in this manuscript is easy to follow, and conclusions are well supported by data.

      Weaknesses:

      The sensory deficit of Penk cKO appears to be quite limited compared to control littermates.

      Reviewer #3 (Public Review):

      Summary:

      Aubert et al investigated the role of PENK in regulatory T cells. Through the mining of publicly available transcriptome data, the authors confirmed that PENK expression is selectively enriched in regulatory but not conventional T cells. Further data mining suggested that OX40, 4-1BB as well as BATF, can regulate PENK expression in Tregs. The authors generated fate-mapping mice to confirm selective PENK expression in Tregs and activated effector T cells in the colon and spleen. Interestingly, transgenic mice with conditional deletion of PENK in Tregs resulted in hypersensitivity to heat, which the authors attributed to heat hyperalgesia.

      Strengths:

      The generation of transgenic mice with conditional deletion of PENK in foxp3 and PENK fate-mapping is novel and can potentially yield significant findings. The identification of upstream signals that regulate PENK is interesting but unlikely to be the main reason why PENK is predominantly expressed in Tregs as both BATF and TNFR are expressed in effector T cells.

      Weaknesses:

      There is a lack of direct evidence and detailed analysis of Tregs in the control and transgenic mice to support the authors' hypothesis. PENK was previously reported to be expressed in skin Tregs and play a significant role in regulating skin homeostasis: this should be considered as an alternative mechanism that may explain the changed sensitivity to heat observed in the paper.

      We now provide a detailed analysis of Treg with or without Penk, from their immunosuppressive functions to their colocalization with sensory neurons in the skin, supporting their function as natural analgesics. The alternate hypothesis relative to skin homeostasis is now clearly presented and discussed.

      Recommendations for the authors):

      Reviewer #2 (Recommendations For The Authors):

      Most of my comments should be addressable in a revised manuscript but will require additional analysis.

      Major:

      - According to flow cytometry analysis, Penk is expressed mostly in Treg of the skin and colon. What may account for such restricted expression? Where could Treg-released enkephalins act?

      We now rephrased the paper to emphasize the known role of Batf in tissue Treg differentiation. We believe the Batf dependency of Penk expression is the reason why tissue Treg are more enriched in Penk than Treg from lymphoid organs. This is now clearly discussed.

      We also provide a new figure (Figure S1) that shows that binding of Batf and co factors AP1 and IRF4 were reported to bind to Penk regulatory regions. Altogether, the role of Batf in tissue Treg differentiation would explain why tissue Treg such as colon and skin are particularly enriched in Penk. This is now clearly stated in the revised manuscript. 

      As to know where Treg-released enkephalins act, we performed immunostainings in the skin and observed that Treg could colocalize with sensory neurons (shown in a new figure 5, panel D). This observation raise the hypothesis that  Treg-released enkephalins could act on sensory neurons locally.

      - Which mechanism can underlie heat hypersensitivity in Penk cKO mice? Which sensory neurons are involved? Are other sensory modalities affected, such as mechanical sensitivity?

      As stated above, we show that Treg can be in close contact with thermal sensors neurons producing CGRP. These data are shown in figure 5D. We have also tested may other nociceptive stimulus (innocuous and noxious) and did not detect significant differences. These data are presented as a supplementary figure S5. Whether enkephalins produced by Treg can change the stimulation threshold of various nervous fibers is currently performed by electrophysiology.

      - No control is provided to ensure that Penk is selectively excised in Treg cells in cKO mice.

      We have performed additional experiments with fluorescent probes to document Penk mRNA expression in cKO mice. The results on the specific expression of Penk mRNA in various subsets post-TMX are shown in a supplementary figure S4.

      - The authors acknowledge that Penk from Treg was previously studied in an animal model of inflammatory pain. However, which role these endogenous opioids play is unclear, especially since authors discovered that enkephalins are likely continuously released at steady states. This is not enough discussed in the narrative, which surprisingly does not separate the results from the discussion.

      The results and discussion are now separated in two sections.

      Minors:

      - Replace "Fox3 1" with "Fox31" (line 31), "functions 15" with "functions15" (line 43), "BATF 19" with "BATF19" (line 85).

      - Text mentions Figure S4 (line 125), which is most likely S3.

      Reviewer #3 (Recommendations For The Authors):

      Given the most significant finding of this paper is based on the heat-induced pain model, there is surprisingly little analysis of Tregs in this context. The authors analyzed spleen and colon Tregs at steady state, it is unclear whether any of these Tregs are involved in pain sensitivity directly. Skin Tregs or other relevant Tregs to this model should be analyzed in control and Lox mice. This is particularly relevant as PENK expression was previously reported in skin Tregs and plays a significant role in skin homeostasis (Yamazaki et al 2020 PNAS). Does PENK conditional deletion alter Treg frequencies, numbers, and immune suppressive function? Not even spleen or colon Treg were analyzed comparing control and lox mice.

      We now provide evidences showing unaltered immunosuppressive functions of Treg in the absence of Penk (Figure 4), and more importantly unaffected proportions of skin Treg in mice lacking Penk in Treg, at the very site of heat stimulation (Figure 5B-C). We also observed unaffected representation of Treg in the spleen and lymph nodes, but we do not feel that these data are necessary to interpret the results.

      Given the role of PENK in skin Tregs, could the observed effect in Figure 4 be due to altered skin homeostasis rather than sensitivity to pain?

      The reviewer is referring to a paper where Penk in skin Treg play a role on UV-damaged keratinocytes in vivo (Shime et al., 2020, PNAS). To our knowledge, a role for Penk produced by skin Treg on keratinocytes homeostasis at the steady state is currently unknown. Nevertheless, this hypothesis is now clearly stated and discussed in the manuscript.

      The authors stated that only after 7 days post tamoxifen treatment was heat hyperalgesia observed: deletion of PENK in Treg but not Tconv should be confirmed: is deletion only complete after 7 days or is the effect observed due to indirect effects of altered "normal" Treg function?

      We have performed a kinetics to document Penk deletion at D3, D7 and 30 post-TMX. Results show a specific deletion of Penk in Treg at all time points so we combined all the time points for the representation of the results (Figure S4). As for the indirect effects of “altered” normal function, we now provide the reader with a new figure (Figure 4), showing that Penk deficient Treg are not impaired in their suppressive function in vitro and in vivo.  

      Actual protein/peptide production of enkephalins by Tregs should be confirmed. It is also unclear which peptide(s) can be secreted and presumably responsible for the changes in heat sensitivity.

      This is a very interesting question that we addressed with a MENK ELISA but without success at reproducing the results. An ongoing project will use mass spectrometry to fully characterize the peptides produced by Treg and activated Tconv.

      The analysis of PENK regulation by Tregs is interesting despite them being entirely based on data mining. BATF is a pioneering factor expressed by all activated effector T cells. While the connection between BATF and PENK may explain why the authors observed PENK expression chiefly in activated effectors and Tregs, BATF cannot be the reason why PENK is "predominantly" expressed by Tregs. Similarly, 4-1BB and OX40 can be induced on effector T cells. Is PENK under the control of Foxp3? There are lots of publically available datasets on Foxp3/IL-2 dependent Treg signatures through which this can be addressed.

      We now provide a supplementary figure (Figure S1), showing a compilation of ChIP Seq studies for various transcription factors in various T cell subsets. We provide the reader with a list of all the TF that have been reported to bind in the regulatory regions of Penk. In agreement with our hypothesis, BATF, FOXP3, IRF4 and several others are present in that list. Further work is needed to decipher the exact contribution of each of those TF to the regulation of Penk in Treg vs activated Tconv that is beyond the scope of this report.

    2. eLife assessment

      This study presents a valuable finding on a new role of Foxp3+ regulatory T cells in sensory perception, which may have an impact on our understanding of somatosensory perception. The authors identified a previously unappreciated action of enkephalins released by immune cells in the resolution of pain and several upstream signals that can regulate the expression of the proenkephalin gene PENK in Foxp3+ Tregs. The generation of transgenic mice with conditional deletion of PENK in Foxp3+ cells and PENK fate-mapping is novel and generates compelling data; they also show a comprehensive analysis of Tregs in control and transgenic mice, longitudinal data on heat sensitivity and co-localization of PENK+ Tregs with thermal sensory neurons in the skin further supporting their hypothesis. The study would be of interest to the biologists working in the field of neuroimmunology and inflammation.

    3. Public Review:

      The study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Editors' note: The authors accepted most if not all the suggestions given by the reviewers and the revised version of the manuscript is substantially improved.

    1. eLife assessment

      Here, the authors developed a cell-based screening assay for the identification of small molecule inhibitors of nonsense-mediated decay (NMD), and used it to validate KVS0001, a new small molecule SMG1 kinase inhibitor derived from the existing inhibitor SMG1i-11, showing it inhibits NMD in cultured cells leading to expression of neoantigens from NMD-targeted genes and slows tumor growth of cancer cell lines possessing a significant number of out-of-frame indel mutations. The conclusions are supported by convincing evidence, and the significance of this work consists in the development of a new and very promising NMD inhibitor drug that acts as an inhibitor of the SMG1 NMD kinase and is effective in animal tumor studies. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal applications.

    2. Reviewer #1 (Public Review):

      Summary:

      This work identified new NMD inhibitors and tested them for cancer treatment, based on the hypothesis that inhibiting NMD could lead to the production of cancer neoantigens from the stabilized mutant mRNAs, thereby enhancing the immune system's ability to recognize and kill cancer cells. Key points of the study include:

      • Development of an RNA-seq based method for NMD analysis using mixed isogenic cells that express WT or mutant transcripts of STAG2 and TP53 with engineered truncation mutations.

      • Application of this method for a drug screen and identified several potential NMD inhibitors.

      • Demonstration that one of the identified compounds, LY3023414, inhibits NMD by targeting the SMG1 protein kinase in the NMD pathway in cultured cells and mouse xenografts.

      • Due to the in vivo toxicity observed for LY3023414, the authors developed 11 new SMG1 inhibitors (KVS0001-KVS0011) based on the structures of the known SMG1 inhibitor SMG1i-11 and the SMG1 protein itself.

      • Among these, KVS0001 stood out for its high potency, excellent bioavailability and low toxicity in mice. Treatment with KVS0001 caused NMD inhibition and increased presentation of neoantigens on MHC-I molecules, resulting in the clearance of cancer cells in vitro by co-cultured T cells and cancer xenografts in mice by the immune system.

      These findings support the strategy of targeting the NMD pathway for cancer treatment and provide new research tools and potential lead compounds for further exploration.

      Strengths:

      The RNA-seq based NMD analysis, using isogenic cell lines with specific NMD-inducing mutations, represents a novel approach for the high-throughput identification of potential NMD modulators or genetic regulators. The effectiveness of this method is exemplified by the identification of a new activity of AKT1/mTOR inhibitor LY3023414 in inhibiting NMD.

      The properties of KVS0001 described in the manuscript as a novel SMG1 inhibitor suggest its potential as a lead compound for further testing the NMD-targeting strategies in cancer treatment. Additionally, this compound may serve as a useful research tool.

      The results of the in vitro cell killing assay and in vivo xenograft experiments in both immuno-proficient and immune-deficient mice indicate that inhibiting NMD could be a viable therapeutic strategy for certain cancers.

      Weaknesses:

      The authors did not address the potential effects of NMD/SMG1 inhibitors on RNA splicing. Given that the transcripts of many RNA-binding proteins are natural targets of NMD, inhibiting NMD could significantly alter splicing patterns. This, in turn, might influence the outcomes of the RNA-seq-based method for NMD analysis and result interpretation.

      While the RNA-seq based approach offers several advantages for analyzing NMD, the effects of NMD/SMG1 inhibitors observed through this method should be confirmed using established NMD reporters. This step is crucial to rule out the possibility that mutations in STAG2 or TP53 affect NMD in cells, as well as to address potential clonal variations between different engineered cell lines.

      The results from the SMG1/UPF1 knockdown and SMG1i-11 experiments presented in Figure 3 correlate with the effects seen for LY3023414, but they do not conclusively establish SMG1 as the direct target of LY3023414 in NMD inhibition. An epistatic analysis with LY3023414 and SMG1-knockdown is needed.

      Comment on the revised version:

      Although KVS0001 exhibits promising properties as an SMG1 inhibitor for cancer treatment, it remains unclear if it is superior to existing SMG1 inhibitors, as no direct comparisons have been made.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work identified new NMD inhibitors and tested them for cancer treatment, based on the hypothesis that inhibiting NMD could lead to the production of cancer neoantigens from the stabilized mutant mRNAs, thereby enhancing the immune system's ability to recognize and kill cancer cells. Key points of the study include:

      • Development of an RNA-seq based method for NMD analysis using mixed isogenic cells that express WT or mutant transcripts of STAG2 and TP53 with engineered truncation mutations.

      • Application of this method for a drug screen and identified several potential NMD inhibitors.

      • Demonstration that one of the identified compounds, LY3023414, inhibits NMD by targeting the SMG1 protein kinase in the NMD pathway in cultured cells and mouse xenografts.

      • Due to the in vivo toxicity observed for LY3023414, the authors developed 11 new SMG1 inhibitors (KVS0001-KVS0011) based on the structures of the known SMG1 inhibitor SMG1i-11 and the SMG1 protein itself.

      • Among these, KVS0001 stood out for its high potency, excellent bioavailability, and low toxicity in mice. Treatment with KVS0001 caused NMD inhibition and increased presentation of neoantigens on MHC-I molecules, resulting in the clearance of cancer cells in vitro by co-cultured T cells and cancer xenografts in mice by the immune system.

      These findings support the strategy of targeting the NMD pathway for cancer treatment and provide new research tools and potential lead compounds for further exploration.

      Strengths:

      The RNA-seq-based NMD analysis, using isogenic cell lines with specific NMD-inducing mutations, represents a novel approach for the high-throughput identification of potential NMD modulators or genetic regulators. The effectiveness of this method is exemplified by the identification of a new activity of AKT1/mTOR inhibitor LY3023414 in inhibiting NMD.

      The properties of KVS0001 described in the manuscript as a novel SMG1 inhibitor suggest its potential as a lead compound for further testing the NMD-targeting strategies in cancer treatment. Additionally, this compound may serve as a useful research tool.

      The results of the in vitro cell killing assay and in vivo xenograft experiments in both immuno-proficient and immune-deficient mice indicate that inhibiting NMD could be a viable therapeutic strategy for certain cancers.

      Weaknesses:

      The authors did not address the potential effects of NMD/SMG1 inhibitors on RNA splicing. Given that the transcripts of many RNA-binding proteins are natural targets of NMD, inhibiting NMD could significantly alter splicing patterns. This, in turn, might influence the outcomes of the RNA-seq-based method for NMD analysis and result interpretation.

      This is a very important comment that highlights an important aspect of NMD and potential exciting downstream studies. We did not systematically assess RNA splicing in our work as we are not sure if inhibition of NMD would induce cancer specific splicing that would allow for tumor targeting. It is well established that NMD can impact splicing, including modulating cryptic exon expression, but finding and assessing antigenicity of targetable tumor specific antigens constitutes a study in and of its own. Our own data in figure 4C-F supports this, as a point mutation near a splice site in TP53 strongly induced NMD which was subsequently stopped by KVS0001 treatment. Doing a systematic review of this effect we feel is outside the scope of this manuscript. We’ve incorporated a comment into our discussion highlighting this deficiency, but certainly find the idea of mining RNA-splicing changes an exciting next endeavor.

      While the RNA-seq-based approach offers several advantages for analyzing NMD, the effects of NMD/SMG1 inhibitors observed through this method should be confirmed using established NMD reporters. This step is crucial to rule out the possibility that mutations in STAG2 or TP53 affect NMD in cells, as well as to address potential clonal variations between different engineered cell lines.

      This is possible, but we want to highlight that all hits from the screen were confirmed in a separate cell line with different clones. While this will not rule out effects to NMD due to STAG2 and TP53 knockdown, the final lead compound was also tested on different endogenous transcripts in both indel and normal transcripts controlled by NMD (i.e., ATF4) in multiple species (human and mouse).  Importantly, many of these assays employed the non-mutated transcripts from heterozygous mutant cells to ensure that cis-acting NMD was being measured and to control for any trans-acting splicing or other unanticipated biochemical effects.

      The results from the SMG1/UPF1 knockdown and SMG1i-11 experiments presented in Figure 3 correlate with the effects seen for LY3023414, but they do not conclusively establish SMG1 as the direct target of LY3023414 in NMD inhibition. An epistatic analysis with LY3023414 and SMG1-knockdown is needed.

      This is a great comment, and is supported by the recent push to confirm drug targets by chemical probes or knockout followed by loss of further effect due to the application of the drug in question. We attempted to knockout SMG1 in multiple cells lines used in this study, including RPE1, MCF10A, NCI-H358 and LS180, and were unable to obtain clones that have biallelic out of frame indels. We were able to obtain multiple clones with in frame indels. Based on our results and those in the publicly available database DepMap we suspect this gene is likely essential, making a simple knockout unfeasible. While this uncertainty is important to keep in mind, we feel it does not detract from the reporting of a novel NMD screen that is mechanistically agnostic and of a novel in vivo active NMD inhibitor.

      Reviewer #2 (Public Review):

      Summary:

      Several publications during the past years provided evidence that NMD protects tumor cells from being recognized by the immune system by suppressing the display of neoantigens, and hence NMD inhibition is emerging as a promising anti-cancer approach. However, the lack of an efficacious and specific small-molecule NMD inhibitor with suitable pharmacological properties is currently a major bottleneck in the development of therapies that rely on NMD inhibition. In this manuscript, the authors describe their screen for identifying NMD inhibitors, which is based on isogenic cell lines that either express wild-type or NMD-sensitive transcript isoforms of p53 and STAG2. Using this setup, they screened a library of 2658 FDA-approved or late-phase clinical trial drugs and had 8 hits. Among them they further characterized LY3023414, showing that it inhibits NMD in cultured cells and in a mouse xenograft model, where it, however, was very toxic. Because LY3023414 was originally developed as a PI3K inhibitor, the authors claim that it inhibits NMD by inhibiting SMG1. While this is most likely true, the authors do not provide experimental evidence for this claim. Instead, they use this statement to switch their attention to another previously developed SMG1 inhibitor (SMG1i-11), of which they design and test several derivatives. Of these derivatives, KVS0001 showed the best pharmacological behavior. It upregulated NMD-sensitive transcripts in cultured cells and the xenograft mouse model and two predicted neoantigens could indeed be detected by mass spectrometry when the respective cells were treated with KVS0001. A bispecific antibody targeting T cells to a specific antigen-HLA complex led to increased IFN-gamma release and killing of cancer cells expressing this antigen-HLA complex when they were treated with KVS0001. Finally, the authors show that renal (RENCA) or lung cancer cells (LLC) were significantly inhibited in tumor growth in immunocompetent mice treated with KVS0001. Overall, this establishes KVS0001 as a novel and promising ant-cancer drug that by inhibiting SMG1 (and therewith NMD) increases the neoantigen production in the cancer cells and reveals them to the body's immune system as "foreign".

      Strengths:

      The novelty and significance of this work consists in the development of a novel and - judging from the presented data - very promising NMD inhibiting drug that is suitable for applications in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application. It will be still a long way with many challenges ahead towards an efficacious NMD inhibitor that is safe for use in humans, but KVS0001 appears to be a molecule that bears promise for follow-up studies. In addition, while the idea of inhibiting NMD to trigger neoantigen production in cancer cells and so reveal them to the immune system has been around for quite some time, this work provides ample and compelling support for the feasibility of this approach, at least for tumors with a high mutational burden.

      Main weaknesses:

      There is a disconnect between the screen and the KVS0001 compound, that they describe and test in the second part of the manuscript since KVS0001 is a derivative of the SMG1 inhibitors developed by Gopalsamy et al. in 2012 and not of the lead compound identified in the screen (LY3023414). Because of high toxicity in the mouse xenograft experiments, the authors did not follow up LY3023414 but instead switched to the published SMG1i-11 drug of Gopalsamy and colleagues, a molecule that is widely used among NMD researchers for NMD inhibition in cultured cells. Therefore, in my view, the description of the screen is obsolete, and the paper could just start with the optimization of the pharmacological properties of SMG1i-11 and the characterization of KVS0001. Even though the screen is based on an elegant setup and was executed successfully, it was ultimately a failure as it didn't reveal a useful lead compound that could be further optimized.

      This is a helpful observation from an outside perspective. From our point of view, we were only alerted to the targeting SMG1 due to the previously reported off-target effects of LY3023414 on SMG and lack of plausible explanation for PIK3CA inhibition to efficiently inhibit NMD. We do feel that the screen is worth including for two reasons. First, it offers an unbiased approach for querying the entire NMD pathway for vulnerabilities useful to target. The library chosen was quite small, so the screen itself could be useful to others with larger libraries to test. Second, it did help identify SMG1 as the ideal target for NMD disruption. While targeting SMG1 is not novel, we felt it highlighted why we chose to develop KVS0001. To address this reviewer’s comment, we’ve included a couple sentences in the results and discussion strengthening the point that the screen provided an unbiased approach to finding the best target in the pathway to disrupt NMD and elaborating on the transition from LY3023414 and the screen to development of KVS0001.

      Additional points:

      - Compared to SMG1i-11, KVS0001 seems less potent in inhibiting SMG1 (higher IC50). It would therefore be important to also compare the specificity of both drugs for SMG1 over other kinases at the applied concentrations (1 uM for SMG1i-11, 5 uM for KVS0001). The Kinativ Assay (Fig. S13) was performed with 100 nM KVS0001, which is 50-fold less than the concentration used for functional assays and hence not really meaningful. In addition, more information on the pharmacokinetic properties and toxicology of KVS0001 would allow a better judgment of the potential of this molecule as a future therapeutic agent.

      We agree that the Kinativ assay may have poorly represented the activity of KVS0001 at the bioactive concentration. We have now added 1uM Kinativ data, the highest concentration we were able to run to figure S13.

      - On many figures, the concentrations of the used drugs are missing. Please ensure that for every experiment that includes drugs, the drug concentration is indicated.

      We apologize for this oversight and have added all drug concentrations on the appropriate plots.

      - Do the authors have an explanation for why LY3023414 has a much stronger effect on the p53 than on the STAG2 nonsense allele (Figure 1B, S8), whereas emetine upregulates the STAG2 nonsense alleles more than the p53 nonsense allele (Figure S5). I find this curious, but the authors do not comment on it.

      This is an interesting observation. The short answer is we’re not sure. The speculative answer is that it is related to the distinctly different mechanisms of actions of the two inhibitors (see comments from reviewing editor below).

      - While it is a strength of the study that the NMD inhibitors were validated on many different truncation mutations in different cell lines, it would help readers if a table or graphic illustration was included that gives an overview of all mutant alleles tested in this study (which gene, type of mutation, in which cell type). In the current version, this information is scattered throughout the manuscript.

      This is an excellent suggestion. We’ve included a new table S1 which incorporates the details of each cell line and the genes used in each for this study.

      - Lines 194 and 302: That SMG1i-11 was highly insoluble in the hands of the authors is surprising. It is unclear why they used variant 11j, since variant 11e of this inhibitor is widely used among NMD researchers and readily dissolves in DMSO.

      As this referee notes SMG1i-11 is soluble in DMSO in our hands as well, which enabled us to use it for our in vitro work. Unfortunately, the concentrations of DMSO required to dissolve the compound to suitable concentrations for in vivo work were too high to safely use in mice with our animal protocols. We also attempted to use ethanol, which also did dissolve SMG1i-11, but led to a significant amount of toxicity in both the drug and vehicle control arms.

      - Line 296: The authors claim that they were able to show that LY3023414 inhibited the SMG1 kinase, which is not true. To show this, they would have for example to show that LY3023414 prevents SMG1-mediated UPF1 phosphorylation, as they did for KVS0001 and SMG1i-11 in Fig. 3F. Unless the authors provide this data, the statement should be deleted or modified.

      We’ve modified this statement as requested by the referee, now saying we suspected SMG1 was the target based on previously published work.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      Your paper has been assessed by two reviewers with expertise in the NMD field. They both find the identification and characterization of a new potent and selective inhibitor of the SMG1 NMD kinase with in vivo activity to represent a significant advance in the field, and one that could ultimately be of value as the basis for a novel cancer therapy. However, as you will see both reviewers have concerns about whether the SMG1 inhibitor screen you developed belongs in the paper because it was not used to identify the KVS0001 inhibitor, which instead was generated based on a previously published set of SMG1 inhibitors, and because the NMD inhibitor that did emerge from your screen, LY3023414, was not shown to be a direct inhibitor of SMG1 kinase activity. While it is an elegant screen, during the revision of the paper you could consider streamlining the manuscript by emphasizing how the screening assay was used to validate KVS0001, and bolstering the characterization of the new KVS0001 NMD inhibitor by conducting the proposed additional experiments.

      Each of the reviewers raises additional points that should be addressed in a revised version.

      The reviewing editor has two additional points:

      (1) While emetine inhibits NMD, it is not really a direct NMD inhibitor, as implied, but rather a potent protein synthesis elongation inhibitor that acts by binding to the E-site of the 40S ribosomal subunit, and is therefore, like anisomycin, another protein synthesis inhibitor, working indirectly to inhibit NMD. This should be acknowledged in the section where emetine is first used as an "NMD inhibitor".

      This has been included in the indicated section at the referee’s request.  

      (2) To establish that the observed phenotypic effects of KVS0001 are due to on-target inhibition of SMG1, the authors could generate and express an SMG1 point mutant that is resistant to KVS0001 inhibition, which could be based on the SMG1 catalytic domain structure that the authors used originally to design KVS001. Inhibitor-resistant kinase mutants are the gold standard for demonstrating that the biological consequences of a novel protein kinase inhibitor are due to on-target effects. Admittedly, because SMG1 is such a huge protein, this may be technically challenging and is likely beyond the scope of the present paper.

      -We agree with the reviewing editor on all accounts: this would be an ideal experiment to run, but also that it is beyond the scope of the present paper. As indicated in our discussion above with reviewer 1, SMG1 knockout was not possible in our hands, and we suspect it may be due to the gene being essential. Creating an inhibitor resistant mutant could overcome this issue and create an ideal model to test the target for KVS0001. Unfortunately finding such a mutant would likely require significant amounts of trial and error to create a resistant mutant that did not lose SMG1 function. And SMG1 is huge, creating technical issues for experimenting. Due to the anticipated amount of work for such a study we believe this would be better accomplished in future studies.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors did not mention a new SMG1 inhibitor and its effects described in Cheruiyot et al, Cancer Res 2019 (PMID: 34215620).

      A comment regarding this discovery and its implications for our work was added to the discussion.

      (2) There is an inconsistency between the manuscript text and methods sections regarding the time of drug treatment (16 hours vs 14 hours) in the HTS screen.

      This has been double checked in our notebook and fixed to reflect 16hrs as the correct incubation time. Thank you for identifying that clerical oversight.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 61: The references to NMD reviews are very old (refs 20 and 21). I suggest citing more recent, up-to-date reviews instead.

      Two additional references, one from 2016 and another from 2023, have been added to increase support for this statement in the introduction.

      (2) Figure S1: Shouldn't the caption of the right panel (TP32 data) say "clone 221" rather than "clone 22"?

      This has been fixed.

      (3) Figure S18: Please indicate on the y-axis that you are displaying RPKM for p53.

      This has been fixed.

      (4) Figures 4D and S19: Please indicate concentrations used for all drugs.

      This has been fixed.

    4. Reviewer #2 (Public Review):

      Summary:

      Several publications during the past years provided evidence that NMD protects tumor cells from being recognized by the immune system by suppressing the display of neoantigens, and hence NMD inhibition is emerging as a promising anti-cancer approach. However, the lack of an efficacious and specific small molecule NMD inhibitor with suitable pharmacological properties is currently a major bottleneck in the development of therapies that rely on NMD inhibition. In this manuscript, the authors describe their screen for identifying NMD inhibitors, which is based on isogenic cell lines that either express wild-type or NMD-sensitive transcript isoforms of p53 and STAG2. Using this setup, they screened a library of 2658 FDA-approved or late-phase clinical trial drugs and had 8 hits. Among them they further characterized LY3023414, showing that it inhibits NMD in cultured cells and in a mouse xenograft model, where it, however, was very toxic. Because LY3023414 was originally developed as a PI3K inhibitor, the authors claim that it inhibits NMD by inhibiting SMG1. While this is most likely true, the authors do not provide experimental evidence for this claim. Instead, they use this statement to switch their attention to another previously developed SMG1 inhibitor (SMG1i-11), of which they design and test several derivatives. Of these derivatives, KVS0001 showed the best pharmacological behavior. It upregulated NMD-sensitive transcripts in cultured cells and the xenograft mouse model, and two predicted neoantigens could indeed be detected by mass spectrometry when the respective cells were treated with KVS0001. A bispecific antibody targeting T cells to a specific antigen-HLA complex led to increased IFN-gamma release and killing of cancer cells expressing this antigen-HLA complex when they were treated with KVS0001. Finally, the authors show that renal (RENCA) or lung cancer cells (LLC) were significantly inhibited in tumor growth in immunocompetent mice treated with KVS0001. Overall, this establishes KVS0001 as a novel and promising ant-cancer drug that by inhibiting SMG1 (and therewith NMD) increases the neoantigen production in the cancer cells and reveals them to the body's immune system as "foreign".

      Strengths:

      The novelty and significance of this work consist in the development of a novel and - judging from the presented data - very promising NMD inhibiting drug that is suitable for applications in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application. It will be still a long way with many challenges ahead towards an efficacious NMD inhibitor that is safe for use in humans, but KVS0001 appears to be a molecule that bears promise for follow-up studies. In addition, while the idea of inhibiting NMD to trigger neoantigen production in cancer cells and so reveal them to the immune system has been around for quite some time, this work provides ample and compelling support for the feasibility of this approach, at least for tumors with a high mutational burden.

      Main weaknesses:

      There is a disconnect between the screen and the KVS0001 compound, that they describe and test in the second part of the manuscript since KVS0001 is a derivative of the SMG1 inhibitors developed by Gopalsamy et al. in 2012 and not of the lead compound identified in the screen (LY3023414). Because of high toxicity in the mouse xenograft experiments, the authors did not follow up LY3023414 but instead switched to the published SMG1i-11 drug of Gopalsamy and colleagues, a molecule that is widely used among NMD researchers for NMD inhibition in cultured cells. Therefore, in my view, the description of the screen is obsolete, and the paper could just start with the optimization of the pharmacological properties of SMG1i-11 and the characterization of KVS0001. Even though the screen is based on an elegant setup and was executed successfully, it was ultimately a failure as it didn't reveal a useful lead compound that could be further optimized.

      Additional points:

      - Compared to SMG1i-11, KVS0001 seems less potent in inhibiting SMG1 (higher IC50). It would therefore be important to also compare the specificity of both drugs for SMG1 over other kinases at the actually applied concentrations (1 uM for SMG1i-11, 5 uM for KVS0001). The Kinativ Assay (Fig. S13) was performed with 100 nM KVS0001, which is 50-fold less than the concentration used for functional assays and hence not really meaningful. In addition, more information on the pharmacokinetic properties and toxicology of KVS0001 would allow a better judgment of the potential of this molecule as a future therapeutic agent.<br /> - On many figures, the concentrations of the used drugs are missing. Please ensure that for every experiment that includes drugs, the drug concentration is indicated.<br /> - Do the authors have an explanation for why LY3023414 has a much stronger effect on the p53 than on the STAG2 nonsense allele (Fig. 1B, S8), whereas emetine upregulates the STAG2 nonsense alleles more than the p53 nonsense allele (Fig. S5). I find this curious, but the authors do not comment on it.<br /> - While it is a strength of the study that the NMD inhibitors were validated on many different truncation mutations in different cell lines, it would help readers if a table or graphic illustration was included that gives an overview of all mutant alleles tested in this study (which gene, type of mutation, in which cell type). In the current version, this information is scattered throughout the manuscript.<br /> - Lines 194 and 302: That SMG1i-11 was highly insoluble in the hands of the authors is surprising. It is unclear why they used variant 11j, since variant 11e of this inhibitor is widely used among NMD researchers and readily dissolves in DMSO.<br /> - Line 296: The authors claim that they were able to show that LY3023414 inhibited the SMG1 kinase, which is not true. To show this, they would have for example to show that LY3023414 prevents SMG1-mediated UPF1 phosphorylation, as they did for KVS0001 and SMG1i-11 in Fig. 3F. Unless the authors provide this data, the statement should be deleted or modified.

      Comments on the revised version:

      - The authors have satisfactorily addressed all my "Additional points" listed above.

      - With the new publishing model of Life, the authors ultimately decide on whether or not to follow reviewers suggestions, and in this case, the authors decided (against my suggestion) to leave the screening part in the manuscript, although it did not result in a useful lead compound. They argue it helped them define in an unbiased way SMG1 as the ideal target for NMD disruption. I would counterargue that this has been known in the field for quite a while.

      - One last suggestion I have to the authors would be to modify the statement in the abstract "This led to the design of a novel SMG1 inhibitor", because what they call "novel" is, in reality, a chemical improvement of the pharmacological properties of a previously reported SMG1 inhibitor (Gopalsamy et al., 2012).

    1. eLife assessment

      This important study analyzes in an original way how tension pattern dynamics can reveal the contribution of active versus passive intercalation during tissue elongation. The authors develop a compelling, elegant analytical framework (isogonal tension decomposition) to disentangle the passive (adjacent tissues pulling) and active (local tension anisotropy) contributions to intercalation events. This allows the generation of global maps of tissue mechanics that will be extremely helpful in the field of biomechanics.

    2. Reviewer #3 (Public Review):

      In their article "The Geometric Basis of Epithelial Convergent<br /> Extension", Brauns and colleagues present a physical analysis of drosophila axis extension that couples in toto imaging of cell contours (previously published dataset), force inference, and theory. They seek to disentangle the respective contributions of active vs passive T1 transitions in the convergent extension of the lateral ectoderm (or germband) of the fly embryo.

      The revision made by the authors has greatly improved their work, which was already very interesting, in particular the use of force inference throughout intercalation events to identify geometric signatures of active vs passive T1s, and the tension/isogonal decomposition. The new analysis of the Snail mutant adds a lot to the paper and makes their findings on the criteria for T1s very convincing.

      About the tissue scale issues raised during the first round of review. Although I do not find the new arguments fully convincing (see below), the authors did put a lot of effort to discuss the role of the adjacent posterior midgut (PMG) on extension, which is already great. That will certainly provide the interested readers with enough material and references to dive into that question.

      I still have some issues with the authors' interpretation on the role of the PMG, and on what actually drives the extension. Although it is clear that T1 events in the germ band are driven by active local tension anisotropy (which the authors show but was already well-established), it does not show that the tissue extension itself is powered by these active T1s. Their analysis of "fence" movies from Collinet et al 2015 (Tor mutants and Eve RNAi) is not fully convincing. Indeed, as the authors point out themselves, there is no flow in Tor mutant embryos, even though tension anisotropy is preserved. They argue that in Tor embryos the absence of PMG movement leaves no room for the germband to extend properly, thus impeding the flow. That suggests that the PMG acts as a barrier in Tor mutants - What is it attached to, then? The authors also argue that the posterior flow is reduced in "fenced" Eve RNAi embryos (which have less/no tension anisotropy), to justify their claim that it is the anisotropy that drives extension. However, previous data, including some of the authors' (Irvine and Wieschaus, 1994 - Fig 8), show that the first, rapid phase of germband extension is left completely unaffected in Eve mutants (that lack active tension anisotropy). Although intercalation in Eve mutants is not quantified in that reference, this was later done by others, showing that it is strongly reduced. Similarly, the Cyto-D phenotype from Clement et al 2017, in which intercalation is also strongly reduced, also displays normal extension.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Joint Public Review:

      Summary:

      Brauns et al. work to decipher the respective contribution of active versus passive contributions to cell shape changes during germ band elongation. Using a novel quantification tool of local tension, their results suggest that epithelial convergent extension results from internal forces.

      Reading this summary, and the eLife assessment, we realized that we failed to clearly communicate important aspects of our findings in the first version of our manuscript. We therefore decided to largely restructure and rewrite the abstract and introduction to emphasize that:

      ● Our analysis method identifies active vs passive contributions to cell and tissue shape changes during epithelial convergent extension

      ● In the context of Drosophila germ band extension, this analysis provides evidence for a major role for internal driving forces rather than external pulling force from neighboring tissue regions (posterior midgut), thus settling a question that has been debated due to apparently conflicting evidence from different experiments.

      ● Our findings have important implications for local, bottom-up self-organization vs top-down genetic control of tissue behaviors during morphogenesis.

      Strengths:

      The approach developed here, tension isogonal decomposition, is original and the authors made the demonstration that we can extract comprehensive data on tissue mechanics from this type of analysis.

      They present an elegant diagram that quantifies how active and passive forces interact to drive cell intercalations.

      The model qualitatively recapitulates the features of passive and active intercalation for a T1 event.

      Regions of high isogonal strains are consistent with the proximity of known active regions.

      We think this statement is somewhat ambiguous and does not summarize our findings precisely. A more precise statement would be that high isogonal strain identifies regions of passive deformation, which is caused by adjacent active regions.

      They define a parameter (the LTC parameter) which encompasses the geometry of the tension triangles and allows the authors to define a criterium for T1s to occur.

      The data are clearly presented, going from cellular scale to tissue scale, and integrating modeling approaches to complement the thoughtful description of tension patterns.

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      We fully agree that a full tissue scale model is crucial to support the claims about tissue scale self-organization we make in the discussion. However, the full analysis of such a model is beyond the scope of the present manuscript. We have therefore split off that analysis into a companion manuscript (Claussen et al. 2023). In this paper, we show that the key results of the tissue-scale analysis of the Drosophila embryo, in particular the order-to-disorder transition associated with slowdown of tissue flow, are reproduced and rationalized by our model.

      We now refer more closely to this companion paper to point the reader to the results presented there.

      Major points:

      (1) The authors mention that from their analysis, they can predict what is the tension threshold required for intercalations in different conditions and predict that in Snail and Twist mutants the T1 tension threshold would be around √2. Since movies of these mutants are most probably available, it would be nice to confirm these predictions.

      This is an excellent suggestion. We have included an analysis of a recording of a Snail mutant, which is presented in the new Figures 4 and S6. As predicted, we find that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished. Further, in the absence of isogonal deformation, T1 transitions indeed occur at a critical tension of approx. √2, as predicted by our model. Both of these results provide important experimental evidence for our model and for isogonal strain as a reliable indicator of external forces.

      (2) While the formalism is very elegant and convincing, and also convincingly allows making sense of the data presented in the paper, it is not all that clear whether the claims are compatible with previous experimental observations. In particular, it has been reported in different papers (including Collinet et al NCB 2015, Clement et al Curr Biol 2017) that affecting the initial Myosin polarity or the rate of T1s does not affect tissue-scale convergent extension. Analysis/discussion of the Tor phenotype (no extension with myosin anisotropy) and the Eve/Runt phenotype (extension without Myosin anisotropy), which seem in contradiction with an extension mostly driven by myosin anisotropy.

      We are happy to read that the referees find our approach elegant and convincing. The referees correctly point out that we have failed to clearly communicate how our findings connect to the existing literature on Drosophila GBE. Indeed, the conflicting results reported in the literature on what drives GBE – internal forces (myosin anisotropy) or external forces (pulling by the posterior midgut) – were a motivation for our study. We have extensively rewritten the introduction, results section (“Isogonal strain identifies regions of passive tissue deformation”), and discussion (“Internal and external contributions to germ band extension”) in response to the referee’s request.

      In brief, distinguishing active internal vs passive external driving of tissue flow has been a fundamental open question in the literature on morphogenesis. Our tension-isogonal decomposition now provides a way to answer this question on the cell scale, by identifying regions of passive deformation due to external forces. As we now explain more clearly, our analysis shows that germ band extension is predominantly driven by internal tension dynamics, and not pulling forces from the posterior midgut.

      We put this cell-scale evidence into the context of previous experimental observations on the tissue scale: Genetic mutants (fog, torso-like, scab, corkscrew, ksr), where posterior midgut invagination is disrupted (Muenster et al. 2019, Smits et al. 2023). In these mutants, the germ band buckles forming ectopic folds or twists into a corkscrew shape as it extends, pointing towards a buckling instability characteristic of internally driven extensile flows.

      To address the apparently conflicting evidence from Collinet et al. 2015, we carried out a

      quantitative re-analysis of the data presented in that reference (see new SI section 3 and Fig.

      S11). The results support the conclusion that the majority of GBE flow is driven internally, thus resolving the apparent conflict.

      Lastly, as far as we understand, Clement et al. 2017 appears to be compatible with our picture of active T1 transitions. Clement et al. report that the actin cortex, when loaded by external forces, behaves visco-elastically with a relaxation time of the order of minutes, in line with our model for emerging interfaces post T1.

      We again thank the referees for prompting us to address these important issues and believe that including their discussion has significantly strengthened our manuscript.

      Recommendations for the authors:

      Minor points:

      - Fig 2 : authors should state in the main text at which scale the inverse problem is solved. (Intercalating quartet, if I understood correctly from the methods) ? and they should explain and justify their choice (why not computing the inverse at a larger scale).

      We have rephrased the first sentence of the section “Cell scale analysis” to clarify that we use local tension inference. This local inference is informative about the relative tension of one interface to its four neighbors. The focus on this local level is justified because we are interested in local cell behaviors, namely rearrangements. Tension inference is also most robust on the local level, since this is where force balance, the underlying physical determinant of the link between mechanics and geometry, resides. In global tension inference, spurious large scale gradients can appear when small deviations from local force balance accumulate over large distances. We have added a paragraph in SI Sec. 1.4 to explain these points.

      -Fig 2 : how should one interpret that tension after passive intercalation (amnioserosa) is higher than before. On fig 2E, tension has not converged yet on the plot, what happens after 20 minutes ?

      Recall that the inferred tension is the total tension on an interface. While on contracting interfaces, the majority of this tension will be actively generated by myosin motors, on extending interfaces there is also a contribution carried by passive crosslinkers. The passive tension can be effectively viewed as viscous dissipation on the elongating interface as crosslinkers turn over (Clement et al. 2017). Note that this passive tension is explicitly accounted for in the model presented in Fig. 5. Notably, it is crucial for the T1 process to resolve in a new extending junction. In the amnioserosa, the tension post T1 remains elevated because the amnioserosa is continually stretched by the convergence of the germ band. The tension hence does not necessarily converge back to 1. However, our estimates for the tension after 20 mins post T1 are very noisy because most of the T1s happen relatively late in the movie (past the 25 min mark) and therefore there are only a few T1s where we can track the post-T1 dynamics for more than 20 mins.

      We have added a brief explanation of the high post-T1 tension at the end of the section entitled “Relative tension dynamics distinguishes active and passive intercalations”. Further, we have moved up the section describing the minimal model right after the analysis of the relative tension during intercalations. We believe that this helps the reader better understand these findings before moving on to the tension-isogonal decomposition which generalizes them to the tissue scale.

      Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help.

      We have added a more detailed explanation in the main text. See our response to the longer question regarding this point below.

      -What exactly defines the boundary curve in figure 3E? How is it computed?

      We have added a sentence in the caption for Fig. 3E explaining that the boundary curve is found by solving Eq. (1) with l set to zero for the case of a symmetric quartet. We have also added a brief explanation immediately below Eq. (1) pointing out that this equation defines the T1 threshold in the space of local tensions T_i in terms of the isogonal length l_iso.

      -The authors should consider incorporating some details described in the SI file to the main text to clarify some points, as long as the accessible style of the manuscript can be kept. The points mentioned below may also be clarified in the SI doc. The specific points that could be elaborated are: Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help. The mapping to Maxwell-Cremona space is fine, but which subset is the quartet? For a set of 4 cells with two shared vertices and a junction, aren't there 5 different tension vectors? Are we talking two closed force triangles? Separately, how do you exactly decompose the deformation (of 4 full cell shapes or a subset?) into isogonal and non-isogonal parts? What is the least squares fit done over - is this system underdetermined? Is this statistically averaged or computed per quartet and then averaged?

      We thank the referees for pointing us to unclear passages in our presentation. We hope that our revisions have resolved the referee’s questions. As described above, we have clarified the tension-isogonal decomposition in the main text. We have also revised the corresponding SI section (1.5) to address the above questions. A sketch of the quartet with labels is found in SI Fig. S7A which we now refer to explicitly in the main text.

      We always consider force-balance configurations, i.e. closed force triangles. Therefore in the “kite” formed by two adjacent tension triangles, only three tension vectors are independent.

      The decomposition of deformation is performed as follows: For each of the four cells, the center of mass c_i is calculated. Next, tension inference is performed to find the two tension triangles with tension vectors T_ij. Now there are three independent centroidal vectors c_j - c_i and three corresponding independent tension vectors T_ij. We define the isogonal deformation tensor I_quratet as the tensor that maps the centroidal vectors to the tension vectors. In general this is not possible exactly, because I_quartet has only three independent components, but there are six equations.

      The plots in Fig. 3C, C’ are obtained by performing this decomposition for each intercalating quartet individually. The data is then aligned in time and ensemble averages are calculated for each timepoint.

      For tissue-scale analysis in Fig. 6, the decomposition is performed for individual vertices (i.e. the corresponding centroidal and tension triangles) and then averaged locally to find the isogonal strain fields shown in Fig. 6B, B’.

      - Line 468: "Therefore, tissue-scale anisotropy of active tension is central to drive and orient convergent-extension flow [10, 57, 59, 60]." Authors almost never mention the contribution of the PMG to tissue extension. Yet it is known to be crucial (convergent extension in Tor mutants is very much affected). Please discuss this point further.

      The referees raise an important point: as discussed in our response to major point (2), we now explicitly discuss the role of internal (active tension) and external (PMG pulling) forces during germ band extension. Please see our response to major point (2) for the changes we made to the manuscript to address this.

      In particular, we now explain that in mutants where PMG invagination is impaired (fog, torso-like, torso, scab, corkscrew), the germ band buckles out of plane or extends in a twisted, corkscrew fashion (Smits et al. 2023). This shows that the germ band generates extensile forces largely internally. In torso mutants, the now stationary PMG acts as a barrier which blocks GBE extension; the germ band buckles as a response.

      The role of PMG invagination hence lies not in creating pulling forces to extend the germ band, but rather in “making room” to allow for its orderly extension. As shown by the genetics mutants just discussed, the synchronization of PMG invagination and GBE is crucial for successful gastrulation.

      -Typos:

      Line 74: how are intercalations are

      Line 84: vertices vertices

      Line 233: very differently

      Line 236: are can

      Line 390: energy which is the isogonal mode must

      Line 1585: reveals show

      Line 603: area Line 618: in terms of on the

      We have fixed these typos.

    4. Reviewer #2 (Public Review):

      Main comment from 1st review:

      Weaknesses:<br /> The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      Comments on the revised version:

      My main concern was that the author did not use the analysis of mutant contexts such as Snail and Twist to confirm their predictions. They made a series of modifications, clarifying their conclusions. In particular, they now included an analysis of Snail mutant and show that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished, supporting the idea that isogonal strain could be used as an indicator of external forces (Fig7 and S6).

      They further discuss their results in the context of what was published regarding the mutant backgrounds (fog, torso-like, scab, corkscrew, ksr) where midgut invagination is disrupted, and where germ band buckles, and propose that this supports the importance of internal versus external forces driving GBE.<br /> Overall, these modifications, in addition to clarifications in the text, clearly strengthen the manuscript.

    1. eLife assessment

      This useful study describes a single set of label-chase mass spectrometry experiments to confirm the molecular function of YafK as a peptidoglycan hydrolase, and to describe the timing of its attachment to the peptidoglycan. Confirmation of the molecular function of YafK will be helpful in further studies to examine the function and regulation of the outer membrane-peptidoglycan link in bacteria. The evidence supporting the molecular function of YafK and that lpp molecules are shuffled on and off the peptidoglycan is solid, however, data supporting conclusions relating to the locations of lpp-peptidoglycan attachment are incomplete. The work will be of interest to researchers studying lipoproteins in gram negative bacteria.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses: 

      - Only one mutant (YafK) is used to make the conclusion. 

      The aim of the study is to determine the effect of the hydrolysis of the PG→Lpp bond on the dynamics of the tethering of Lpp to PG. Since YafK is the only enzyme catalyzing this reaction, it is appropriate to compare the wild-type strain to an isogenic yafK deletion mutant. Nonetheless, we carefully consider this comment and will investigate the dynamics of the tethering of Lpp to PG in mutants deficient in the production of the L,D-transpeptidases responsible for tethering Lpp to PG.

      Additional kinetic analyses were performed on strains relying on a single L,D-transpeptidase for LPP tethering to PG. Escherichia coli produces three L,D-transpeptidases catalyzing the tethering of LPP to PG (Ybis, YcfS, and ErfK). The corresponding genes were deleted from the chromosome of strain BW25113, thus generating strain BW25113Δ3. Plasmids encoding each one of these three enzymes were independently introduced in BW25113Δ3. Qualitatively, LC-MS analyses revealed similar kinetics for the four Tri-KR isotopologues purified from wild-type strain BW25113 and from the three BW25113Δ3 derivatives producing a single plasmidencoded L,D-transpeptidase (Ybis, YcfS, or ErfK) under the control of a rhamnose inducible promoter (Prha) of plasmid pHV30 (Voedts et al. EMBO J. 2021 40:e108126, doi: 10.15252/embj.2021108126) (see panel A in figure 1 below). Briefly, and as indicated in the first version of the main text, the old→new Tri→KR isotopologue was first synthesized. The new→new isotopologue was not detected 5 min after the medium switch. These results indicate that the newly-synthesized PG disaccharidepeptide subunits and Lpp are independently incorporated into the expanding PG polymer. The proportion of the new→old isotopologue exceeded that of the old→new isotopologue at around 40 min (for the strain producing ErfK) or 20 min (for the strains producing Ybis or YcfS). This is the hallmark of the activity of the YafK hydrolase that liberates existing (old) Lpp that can be tethered to newly synthesized disaccharide-peptide subunit thereby generating the new→old isotopologue. In absence of the YafK hydrolase, the relative proportion of the new→old isotopologue is lower since this isotopologue can only result from the tethering of the preexisting free forms of Lpp to newly synthesized disaccharide-peptide units. The contribution of YafK to variations in the relative abundance of the four isotopologues was also investigated by combining the relative abundance of isotopologues containing either old versus new KR (panel B) or old versus new PG stem peptide (panel C) moieties. As discussed in the first version of the manuscript for strains BW25113 and BW25113ΔyafK, this analysis revealed that the existing (old) disaccharide-tripeptide moieties in the Tri→RK isotopologues disappears more rapidly than the existing (old) KR moieties due to the hydrolysis of the old→old Tri-KR isotopologue by YafK. These results indicate that the mode of tethering of Lpp to PG and the dynamic equilibrium between the PG-tethered and free forms of Lpp are similar for the Ybis, YcfS, and ErfK L,D-transpeptidases. Quantitatively, we also noticed that the overall decrease in the relative abundance of all Tri→KR isotopologues containing existing (old) moieties was slower for the strains producing only ErfK, Ybis, or YcfS than for the wild type and ΔyafK strains.  This could be accounted for by an increase in the generation time of the former group of three strains. This is a limitation of our study because it precludes the comparison of the evolution of a particular isotopologue in several strains, as performed in Fig. 3 for strains BW25113 and BW25113ΔyafK. For this reason, we prefer to present these data in the rebuttal rather than in the manuscript. Indeed, presentation of the data in the main text would require introducing a new mode of presentation of the data (variations in the relative abundance of all four isotopologues in the same strain; see figure below) in addition to variations of the relative abundance of any one of the four isotopologues between strains (Fig. 3). Introduction of this additional mode of presentation of the data would complicate the initial manuscript in an unnecessary manner because the data obtained with mutants producing a single L,D-transpeptidase (ErfK, YbiS, or YcfS) confirmed the data obtained with the wild-type strains producing the three L,D-transpeptidases.

      Author response image 1.

      MS-based kinetic analysis of Lpp tethering to PG.

      -Time points to analyse Tri-KR isotopologues in Wt (0,10,20,40,60 min) and yafK mutant (0,15, 25, 40, 60 min) are not the same. 

      The purpose of the experiments is to compare the kinetics of formation and hydrolysis of the PG→Lpp bond in the WT versus ΔyafK strains. Comparison of the kinetics is therefore possible even though the kinetics are not based on the exact same time points. Nonetheless, we will reproduce the kinetics experiment (see also answers to Reviewer 2) and use the same time points in these additional experiments.

      We have performed additional analyses to provide kinetic data for at least three biological repeats and for the same periods of incubation after the medium switch (0, 10, 20, 40, and 60 min). The full set of data, including means and standard deviations, appear in the additional Table S1. We have also updated Fig. 3 with the means calculated with these additional values. The conclusions of the first version of the manuscript are fully supported by the additional data requested by the reviewer. We have also revised Fig. 4 based on the full set of data appearing in Table S2.

      Reviewer #2 (Public Review): 

      Weaknesses: 

      - However, the authors make a few other conclusions from their data which are harder to understand the logic of, or to feel confident in based on the existing data. They claim that their 5-time point kinetic data indicates that new lpp is not substantially added to lipidII before it is added to the peptidoglycan, and that instead lpp is attached primarily to old peptidoglycan. I believe that this conclusion comes from the comparison of Fig.s 3A and 3C, where it appears that new lpp is added to old peptidoglycan a few minutes before new lpp is added to new peptidoglycan. However, the very small difference in the timing of this result, the minimal number of time points and the complete lack of any presentation of calculated error in any of the data make this conclusion very tenuous. In addition, the authors conclude that lpp is not significantly attached to septal peptidoglycan. The logic behind this conclusion appears to be based on the same data, but the authors do not provide a quantitative model to support this idea.  

      The reviewer is correct in stating that we claim that Lpp is not substantially added to lipid II before incorporation of the disaccharide-pentapeptide subunit into the expanding PG network. This conclusion is based on the paucity of PG-Lpp covalent adducts containing light PG and Lpp moieties at the earliest time points. To substantiate more thoroughly this finding, we will reproduce the kinetic experiments with more early time points. The paucity of the new→new PG-Lpp isotopologues also implies that Lpp might not be extensively tethered to septal peptidoglycan since the latter is assembled from newly synthesized PG (see our previous publication Atze et al. 2021 and references therein). Quantitatively, septal synthesis roughly accounts for one third of the total PG synthesis. It is therefore expected that tethering of Lpp to septal PG would represent one third of the total number of newly synthesized Lpp molecules tethered to PG. We therefore proposed that the paucity of new→new PG- Lpp isotopologues at early time points of the kinetics implies that Lpp is preferentially tethered to the side wall. This is only one of several conclusions that we reach in the present study and we were very careful in the wording of our results. 

      We would first like to stress that our claim that Lpp is primarily attached to old peptidoglycan rather than to lipid II is indeed supported by the results presented in the first version of the manuscript. In fact, the opposite mechanism, i.e. Lpp linking to Lipid II, as established for the linking of proteins to PG by sortases in Gram-positive bacteria, would result in the exclusive tethering of newly synthesized Lpp to newly synthesized PG stems (Fig. 3). This is clearly not the case since the new→new isotopologues are present in small amounts 10 min after the medium switch and are not detectable at 5 min (data appearing in Table S1 and new mass spectra added to Supplementary file 1). Instead, our data indicate that newly synthesized Lpp is tethered to existing PG. Thus, the relevant comparison is not the absolute value of the delay in the appearance of isotopologues in Figs 3A and 3C, as suggested by the reviewer. Rather, the relevant comparison should take into consideration these two following modes of Lpp tethering to PG: (i) tethering Lpp to Lipid II versus (ii) tethering of Lpp to existing PG independently from insertion of new subunits into the expanding PG. The former mode implies the exclusive formation of new→new isotopologues, which were not detected at early time points. The latter mode implies the prevalent formation of old→new isotopologues that were indeed preponderant at early time-points. Thus, our analysis clearly eliminates the first mode of Lpp tethering to PG (tethering of Lpp to Lipid II) and validates the second one (tethering of Lpp to existing PG). As stated in our answers to reviewer 1, we have generated additional repeats and the full set of data, including means and SD values, appears in the additional Supplementary Tables S1 and S2. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -All major reactions catalysed by L,D-transpeptidases must be studied using the labeling-mass spec technique and compared with YafK to strengthen the conclusions. 

      As described above (Figure 1), we explored the dynamics of Lpp tethering in mutants producing a single L,D-transpeptidase.

      -Experiments on the effect of YafK on the bacterial envelope and production of vesicles should be concluded to support the claims. 

      We have analyzed the extent of outer membrane vesicle (OMV) formation both in the wild type strain and in each one of the mutant strains characterized in this study by using a procedure described in detail in one of our previous publications (Hugonneau-Beaufet et al. Microbiol Spectr. 2023 11:e0521722, doi: 10.1128/spectrum.05217-22). Figure 2 below shows that loss of Lpp or of its tethering to PG, following deletion of genes encoding L,D-transpeptidases ErfK, YbiS, and YcfS, results in the formation of OMVs as revealed by the presence of the maltose-binding protein (MBP, 42 kDa) in the corresponding spare culture medium (as detected by immunoblotting). The RNA polymerase subunit RpoA (36 kDa), used as a control, was not detected in these spare culture media, indicating that loss of either Lpp alone or of ErfK, YbiS, and YcfS together was not associated with bacterial lysis. This analysis also showed that production of ErfK, YbiS, or YcfS alone was sufficient to prevent formation of OMVs. Finally, deletion of YafK, as expected, did not lead to OMV formation. These confirmatory results are out of the scope of the manuscript that focuses on the dynamics of Lpp tethering to PG rather than on the role of that tethering in the envelope stability. 

      Author response image 2.

      Figure 2. Immuno-detection of OMV formation.

      Reviewer #2 (Recommendations For The Authors): 

      - Why so much background about previous results in the abstract? Previous results don't seem required for understanding the description of new results here. Maybe put a sentence about importance at the end, instead.

      The background information is important for two reasons. First, because it is important to stress that the method used to determine the structure and dynamics of the isotopologues is novel and has been validated in various ways, including the modeling of isotopic clusters, in a previous study (https://doi.org/10.7554/eLife.72863). Since the current study is an extension of this previous report it is relevant to introduce the type of information that can be obtained by this approach. Second, because it is also important to stress that kinetic analyses have been previously reported for the incorporation              of           disaccharide-peptide      units into        the         expanding           peptidoglycan (https://doi.org/10.7554/eLife.72863). In the current study, we focused on the mode of Lpp-to-PG tethering in the context of PG expansion that thus had to be introduced. 

      - Abstract: tethering of lpp to septal pg is limited by what? Limited to what? Wording not clear.

      The unclear sentence has been rephrased. Revised version “Newly synthesized septum PG appears to contain small amounts of tethered Lpp.”  

      - The figure legend for fig 1b - I only see one red double arrow?

      Black double arrows indicate the position of glycosidic bonds cleaved by the muramidases. Their size was increased so that they appear more distinctly in the image.

      - Fig 3 and Fig 4- these should be shown with error. 

      The full set of data with means and standard deviations appear in Supplementary Tables S1 and S2.

      - This new-> old, old-> new annotation is confusing. Is the PG fragment or the lpp old or new? Are you distinguishing between which part is old and new by the ordering? Or, could either the PG fragment or the lpp be old to be annotated as old-> new? I think you are trying to explain it in the figure 3CD legend, but it could be presented more clearly. When you say respectively, do you mean that old->new means old muropeptide, new lpp? And new-> old means new muropeptide and old lpp? Why not just use the same annotation system you use in fig 2? Or, use subscripts to indicate old and new?. 

      The designation of isotopologues is correct and adequate to designate the products of transpeptidation catalyzed both by PBPs and L,D-transpeptidases. This nomenclature of transpeptidation products has been introduced in the 70s (see Schleifer and Kandler 1972 Bacteriological Reviews 36:407-477).  In this bond designation, the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond. For the Tri→KR isotopologues, the peptide stem acts as the acyl donor whereas Lpp acts as the acyl acceptor. There is therefore no ambiguity in the annotation. This also applies to the old→new-type annotation, old (existing) PG stem linked to new (neosynthesized) Lpp. In the figures, we used a color code to identify old (red) and new (purple) in the Tri→KR moieties. Since a color code cannot be used in the main text, we used the old→new-type of annotation. A sentence has been added at the end of the legend to Fig. 1b to introduce this nomenclature “Please note that we used the standard nomenclature for transpeptidation products in which the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond”.

      - Pg 5 - first paragraph. I'm struggling with the logic of your conclusion that lpp is not attached to lipid II - it seems that this conclusion is based on the timing of the appearance of the hybrid isotopes. You say you would expect the new-new ones to appear quickly, but how quickly would you expect that, and why? You do see new-new ones appearing fairly quicky, in 20 minutes, so I don't understand the logic of why that timing excludes the lipidII modification model. Please elaborate further. 

      See answer above to reviewer 2 and analysis of samples collected shortly after the medium switch (Table S1). See also the revised version of Supplementary file 1 that shows mass spectra for peptidoglycan extracted 5 min after the medium switch.

      - The conclusion about tethering of lpp to septal PG also appears to be somewhat tenuous, which the authors concede when then use the word "might" in the section of the results. However, the language in the abstract is more definitive. Please tone down the language in the abstract, or provide more evidence to support this conclusion. At the least, you could add a little discussion of the numbers. At a given time in mixed culture, how much PG is being constructed at the septum? How does that percentage line up with the rate of PG label loss vs the rate of lpp label loss? 

      -  Pg 5, bottom paragraph. I don't know what you mean by "there was no loss of old->old in the ∆yafK strains, " when you just a sentence above described the decrease. 

      The data of the MS analyses are presented as the relative abundance of isotopologues. If the old→old Tri→KR isotopologue present at the medium shift were not hydrolyzed by YafK, its absolute amount would remain constant over time. However, the relative abundance of the old→old isotopologue decreases by 50% in one generation because the total amount of the Tri→KR muropeptide doubles in one generation (as any of the bacterial constituents). In Fig. 3B, we indeed observed that the relative amount of old→old isotopologue is about 50% after one generation in the ΔyafK mutant indicating the persistence of the isotopologue. In contrast, production of YafK in the strain BW25113 results in lower abundance of this isotopologue (in the order of 90%). 

      To better explicit the concept we expanded the reasoning in the relevant paragraph of the revised version of the manuscript. 

      - Pg 6 - I don't understand how you are drawing a conclusion about the proteolytic degradation of lpp from these data. Please clarify your reasoning.

      In the analysis presented in Fig. 4, we investigated the relative abundance of old and new Lpp based on the relative abundance of old and new KR moieties in all four Tri-KR isotopologues. As stated in the preceding answer, the relative abundance of KR moieties should be 50% after one generation if no degradation of Lpp occurs. This is observed both for BW25113 (Fig. 4A) and for the ΔyafK mutant (Fig. 4B), thus supporting our claim that Lpp is not degraded. In contrast, the relative abundance of the old Tri moiety is lower than 50% for the wild type strain (Fig. 4C) but not for the ΔyafK mutant (Fig. 4D). This reflects the fact that YafK hydrolyzes the PG-Lpp bond and that Lpp released by this reaction can be cross-linked to neo-synthesized PG stems. Please note that, in this reaction, the substrate is a tetrapeptide donor stem (Fig. 1C).

    3. Reviewer #1 (Public Review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      Additionally susceptibility of strains to detergents like SDS can be tested to provide a much needed physisological context to the study.

      In summary, the authors should consider revising the manuscript to improve clarity, substantiate their claims with more detailed evidence, and include additional experimental results that provide necessary physiological context to their study.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The authors of this study have sought to better understand the timing and location of the attachment of the lpp lipoprotein to the peptidoglycan in E. coli, and to determine whether YafK is the hydrolase that cleaves lpp from the peptidoglycan.

      Strengths:<br /> The method is relatively straightforward. The authors are able to draw some clear conclusions from their results, that lpp molecules get cleaved from the peptidoglycan and then re-attached, and that YafK is important for that cleavage.

      Weaknesses:<br /> Figure 3 and 4 - why are the data shown here only two biological replicates, when there are 3-5 replicates shown in table S1 and S2? This makes it seem like you are cherry picking your favorite replicates. Please present the data as the mean of all the replicates performed, with error shown on the graph.

      This work will have a moderate impact on the field of research in which the connections between the OM and peptidoglycan are being studied in E. coli. Since lpp is not widely conserved in gram negatives, the impact across species is not clear. The authors do not discuss the impact of their work in depth.

    1. eLife assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

    2. Reviewer #1 (Public Review):

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future - but most of the "predictions" from the model are actually findings that broadly match earlier experimental results, making them "postdictions". This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

    4. Reviewer #3 (Public Review):

      Summary:

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      Assessment and context:

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

    5. Author response:

      eLife assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We thank the Reviewers and the Reviewing Editor for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. In what follows we summarize our current plan to improve the paper taking up on their suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We will describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how this ratio of neuron numbers depends on the weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig 6E). We will make sure that these results are suitably expanded and better emphasized in revision. We will also include new analysis of dependence of optimal parameters on the relative weighting of encoding error vs metabolic cost in the loss function when studying other parameters (namely: noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity, time constants of single E and I neurons).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity similar to those proposed in the above papers. We apologize if this was not clear enough in the previous version. We will make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because the structure derived in our network is not identical to those studied in the above paper, and because these results give information about how lateral inhibition works in this network. Thus, we will keep presenting it in the revised version, although we will de-emphasize and simplify its presentation to give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We will improve the Limitations paragraph in Discussion, and also anticipate caveats in tandem with results when needed, as suggested.

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future - but most of the "predictions" from the model are actually findings that broadly match earlier experimental results, making them "postdictions".

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We will better distinguish between pre- and post-dictions  in revision.

      Reviewer #2 (Public Review):

      Summary: In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      We are addressing this issue in two ways. First, we will present results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. Namely we plan to vary jointly the noise intensity and the metabolic constant, as well as the ratio of E to I neuron numbers and the ratio of mean I-I to E-I connectivity. Second, we will individuate a reasonable/realistic range of possible variations of each individual parameter and then perform a Monte Carlo search for the optimal point within this range, and compare the so-obtained results with those obtained from the understanding gained from varying one or two parameters at a time.  We will also add the suggested citation to Calaim et al. 2022 in regard to the points discussed above.

      We will improve the comparison between the Excitatory-Inhibitory and the 1-Cell-Type model (see reply to the suggestions of Referee 3 for more details).

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      In the previously submitted manuscript we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We will improve this work by adding the suggested calculations to provide quantitative measures of the dependence of the optimal network parameters and configurations on this relative weighting.

      Reviewer #3 (Public Review):

      Summary: In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because having only connections respecting Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. However, to get better insights into how Dale’s Law constrains or influences the design of efficient networks, we added a comparison of the coding properties of networks that either do or do not satisfy Dale’s law. We apologize if this was not sufficiently clear in the previous version and we will clarify this in revision. 

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We will perform the suggested detailed comparisons between the network loss in the 1CT-model and E-I model and then revise or refine conclusions if and as needed, according to the results we will obtain.

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We will try to make the presentation of the model more accessible to a non-computational audience.

      Assessment and context: Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We will make sure that these points emerge more clearly and in a more accessible way from the revised paper.

    1. Reviewer #3 (Public Review):

      Summary:

      In this work, Simon et al present a new computational tool to assess non-Brownian single-particle dynamics (aTrack). The authors provide a solid groundwork to determine the motion type of single trajectories via an analytical integration of multiple hidden variables, specifically accounting for localization uncertainty, directed/confined motion parameters, and, very novel, allowing for the evolution of the directed/confined motion parameters over time. This last step is, to the best of my knowledge, conceptually new and could prove very useful for the field in the future. The authors then use this groundwork to determine the motion type and its corresponding parameter values via a series of likelihood tests. This accounts for obtaining the motion type which is statistically most likely to be occurring (with Brownian motion as null hypothesis). Throughout the manuscript, aTrack is rigorously tested, and the limits of the methods are fully explored and clearly visualised. The authors conclude with allowing the characterization of multiple states in a single experiment with good accuracy and explore this in various experimental settings. Overall, the method is fundamentally strong, well-characterised, and tested, and will be of general interest to the single-particle-tracking field.

      Strengths:

      (1) The use of likelihood ratios gives a strong statistical relevance to the methodology. There is a sharp decrease in likelihood ratio between e.g. confinement of 0.00 and 0.05 and velocity of 0.0 and 0.002 (figure 2c), which clearly shows the strength of the method - being able to determine 2nm/timepoint directed movement with 20 nm loc. error and 100 nm/timepoint diffusion is very impressive.

      (2) Allowing the hidden variables of confinement and directed motion to change during a trajectory (i.e. the q factor) is very interesting and allows for new interpretations of data. The quantifications of these variables are, to me, surprisingly accurate, but well-determined.

      (3) The software is well-documented, easy to install, and easy to use.

      Weaknesses:

      (1) The aTrack principle is limited to the motions incorporated by the authors, with, as far as I can see, no way to add new analytical non-Brownian motion. For instance, being able to add a dynamical state-switching model (i.e. quick on/off switching between mobile and non-mobile, for instance, repeatable DNA binding of a protein), could be of interest. I don't believe this necessarily has to be incorporated by the authors, but it might be of interest to provide instructions on how to expand aTrack.

      (2) The experimental data does not very convincingly show the usefulness of aTrack. The authors mention that SPBs are directed in mitosis and not in interphase. This can be quantified and studied by microscopy analysis of individual cells and confirming the aTrack direction model based on this, but this is not performed. Similarly, the size of a confinement spot in optical tweezers can be changed by changing the power of the optical tweezer, and this would far more strongly show the quantitative power of aTrack.

      (3) The software has a very strict limit on the number of data points per trajectory, which is a user input. Shorter trajectories are discarded, while longer trajectories are cut off to the set length. It is not explained why this is necessary, and I feel it deletes a lot of useful data without clear benefit (in experimental conditions).

    2. eLife assessment

      In this valuable contribution, the authors present a novel and versatile probabilistic tool for classifying tracking behaviors and understanding important parameters for different types of single-particle motion. The tool will be broadly applicable to single-particle tracking studies. While some reviewers feel that the methodology has been convincingly tested by computational comparisons and experimental data, others feel that the mathematical foundation needs to be strengthened and clearly defined.

    3. Reviewer #1 (Public Review):

      Summary:

      Weiss and co-authors presented a versatile probabilistic tool. aTrack helps in classifying tracking behaviors and understanding important parameters for different types of single particle motion types: Brwonian, Confined, or Directed motion. The tool can be used further to analyze populations of tracks and the number of motion states. This is a stand-alone software package, making it user-friendly for a broad group of researchers.

      Strengths:

      This manuscript presents a novel method for trajectory analysis.

      Weaknesses:

      (1) In the results section, is there any reason to choose the specific range of track length for determining the type of motion? The starting value is fine, and would be short enough, but do the authors have anything to report about how much is too long for the model?

      (2) Robustness to model mismatches is a very important section that the authors have uplifted diligently. Understanding where and how the model is limited is important. For example, the authors mentioned the limitation of trajectory length, do the authors have any information on the trajectory length range at which this method works accurately? This would be of interest to readers who would like to apply this method to their own data.

      (3) aTrack extracts certain parameters from the trajectories to determine the motion types. However, it is not very clear how certain parameters are calculated. For example, is the diffusion coefficient D calculated from fitting, and how is the confinement factor defined and estimated, with equations? This information will help the readers to understand the principles of this algorithm.

      (4) The authors mentioned the scenario where a particle may experience several types of motion simultaneously. How do these motions simulated and what do they mean in terms of motion types? Are they mixed motion (a particle switches motion types in the same trajectory) or do they simply present features of several motion types? It is not intuitive to the readers that a particle can be diffusive (Brownian) and direct at the same time.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors present a software package "aTrack" for identification of motion types and parameter estimation in single-particle tracking data. The software is based on maximum likelihood estimation of the time-series data given an assumed motion model and likelihood ratio tests for model selection. They characterized the performance of the software mostly on simulated data and showed that it is applicable to experimental data.

      Strengths:

      A potential advantage of the presented method is its wide applicability to different motion types.

      Weaknesses:

      (1) There has been a lot of similar work in this field. Even though the authors included many relevant citations in the introduction, it is still not clear what this work uniquely offers. Is it the first time that direct MLE of the time-series data was developed? Suggestions to improve would include (a) better wording in the introduction section, (b) comparing to other popular methods (based on MSD, step-size statistics (Spot-On, eLife 2018;7:e33125), for example) using the simulated dataset generated by the authors, (c) comparing to other methods using data set in challenges/competitions (Nat. Comm (2021) 12:6253).

      (2) The Hypothesis testing method presented here has a number of issues: first, there is no definition of testing statistics. Usually, the testing statistics are defined given a specific (Type I and/or Type II) error rate. There is also no discussion of the specificity and sensitivity of the testing results (i.e. what's the probability of misidentification of a Brownian trajectory as directed? etc). Related, it is not clear what Figure 2e (and other similar plots) means, as the likelihood ratio is small throughout the parameter space. Also, for likelihood ratio tests, the authors need to discuss how model complexity affects the testing outcome (as more complex models tend to be more "likely" for the data) and also how the likelihood function is normalized (normalization is not an issue for MLE but critical for ratio tests).

      (3) Relating to the mathematical foundation (Figure 1b). The measured positions are drawn as direct arrows from the real position states: this infers instantaneous localization. In reality, there is motion blur which introduces a correlation of the measured locations. Motion blur is known to introduce bias in SPT analysis, how does it affect the method here?

      (4) The authors did not go through the interpretation of the figure. This may be a matter of style, but I find the figures ambiguous to interpret at times.

      (5) It is not clear to me how the classification of the 5 motion types was accomplished.

      (6) Figure 3. Caption: what is ((d_{est}-0.1)/0.1)? Also panel labeled as "d" should be "e".

    1. eLife assessment

      Rademacher and colleagues examined the effect of a chemogenetic approach on the integrity of the dopamine system in mice with chronically stimulated dopamine neurons. These findings are important: 1) This approach led to an axon-first degeneration over a time course of 2-4 weeks; 2) The finding that direct excitation of dopaminergic neurons causes differential degeneration sheds light on dopaminergic neuron selective vulnerability mechanisms. Overall, the strength of the evidence is solid, but the behavior experiments that do not include a CNO control provide incomplete support for the findings.

    2. Reviewer #2 (Public Review):

      Summary:

      Rademacher et al. present a paper showing that chronic chemogenetic excitation of dopaminergic neurons in the mouse midbrain results in differential degeneration of axons and somas across distinct regions (SNc vs VTA). These findings are important. This mouse model also has the advantage of showing a axon-first degeneration over an experimentally-useful time course (2-4 weeks). 2. The findings that direct excitation of dopaminergic neurons causes differential degeneration sheds light on the mechanisms of dopaminergic neuron selective vulnerability. The evidence that activation of dopaminergic neurons causes degeneration and alters mRNA expression is convincing, as the authors use both vehicle and CNO control groups, but the evidence that chronic dopaminergic activation alters circadian rhythm and motor behavior is incomplete as the authors did not run a CNO-control condition in these experiments.

      Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

    3. Author response:

      Reviewer #1 (Public Review):

      [...] Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have planned several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We anticipate that these additions will significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      [...] Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these insightful comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      This is an important point. Although we show that CNO does not produce degeneration of DA neuron terminals, we do not exclude a contribution to the behavioral changes. We agree that this behavioral control is necessary, and will address it in revision with a CNO-only running wheel cohort.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and will complete these experiments in revision.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We will explicitly clarify which mice had access to a running wheel in our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps sterically prevented mice from having access to a running wheel in their home cage.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promotes degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not report an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than the VTA, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel.

      In addition, we are not aware of prior studies that have chronically activated DREADDs to produce neurodegeneration. Other studies have shown that acute excitotoxic stressors can produce neuronal degeneration, but the chronic increase in activity is central to our approach.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      As discussed in greater detail in the results section below, our data suggests this may not be a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and will expand on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking and the existing data are difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). Beyond what is already discussed in the manuscript, additional support for increased activity in PD models include:

      - Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      - Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      - Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We will include and further discuss these important examples in our revision.

      Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity. There will be additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models the possibility of chronically increased pacemaking, and interpretation of our results will be informed as we learn more about how the activity of DA neurons changes in humans in PD. We will discuss and elaborate on these important points in the revision.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. We will clarify this point in our revision. In addition, the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al, Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al, Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al, J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al, Annu Rev Pathol 2011, PMID: 21034221).

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020,  PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. While we briefly discuss calcium and neurodegeneration in the discussion, we will expand on this literature in both the introduction and discussion sections. We will carefully review and contextualize our work within existing frameworks of calcium and neurodegeneration (e.g. Surmeier & Schumacker, J Biol Chem 2013, PMID: 23086948; Verma et al., Transl Neurodegener 2022, PMID: 35078537). We believe that the novelty of our study lies in 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Supplemental Figure 1C, which was unchanged in CNO-treated animals compared to controls. We did not report the resting membrane potential because many of the DA neurons were spontaneously firing. However, we will report the initial membrane potential on first breaking into the cell for the whole cell recordings in the revision, which did not vary between groups. This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing of the neuron by the internal solution. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S1C). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (S1E).  This finding is also consistent with increased activity of the DA neurons. We will add discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this insightful comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Nonetheless, we believe that it conveys useful complementary data. As suggested, we will discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. As part of the planned experiments for our revision, we will perform an additional stereologic analysis to further assess the loss of SNc dopamine neurons. We will also review and ensure the images are representative.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We will include a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We will also include frequency and amplitude data for these recordings.   

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      We will review the expression of activity-related genes in our dataset, although we must keep in mind that these genes may behave differently in the context of chronic activation as opposed to acutely increased activity. We will also include experiments assessing striatal dopamine levels by HPLC in the revision.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing to early PD samples when there is more limited SNc DA neuron loss. Please note the numbers of DA neurons within the areas we have selected for sampling (Figure at right). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration and patients where degeneration is ongoing during their disease.

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV).

      Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by increasing intracellular calcium to increase neuronal excitability, and our results show increased Ca2+ by fiber photometry and changes to Ca2+-related genes, strongly suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point, as we acknowledged in the text. Additionally, we have planned revision experiments involving chronic isradipine treatment to further test the role of calcium in the mechanism of degeneration in this model.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we can sample SN DA neurons in early PD (see figure above), and in our view there is great value for such comparisons. We agree that discussion of appropriate caveats is warranted and this will be clearly addressed in the revision.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we will include additional electrophysiology experiments and add discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150) while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020,  PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We will amend our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we will revise the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

    4. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigated the effect of chronic activation of dopamine neurons using chemogenetics. Using Gq-DREADDs, the authors chronically activated midbrain dopamine neurons and observed that these neurons, particularly their axons, exhibit increased vulnerability and degeneration, resembling the pathological symptoms of Parkinson's disease. Baseline calcium levels in midbrain dopamine neurons were also significantly elevated following the chronic activation. Lastly, to identify cellular and circuit-level changes in response to dopaminergic neuronal degeneration caused by chronic activation, the authors employed spatial genomics (Visium) and revealed comprehensive changes in gene expression in the mouse model subjected to chronic activation. In conclusion, this study presents novel data on the consequences of chronic hyperactivation of midbrain dopamine neurons.

      Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

    5. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons. In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript. The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focussing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis. Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results.

      The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

    1. eLife assessment

      This useful study uses high-field fMRI to test the hypothesized involvement of subcortical structure, particularly the striatum, in WM updating. It overcomes limitations in prior work by applying high-field imaging with a more precise definition of ROIs. Thus, the empirical observations are of use to specialists interested in working memory gating or the reference back task specifically. However, evidence to support the broader implications, including working memory gating as a construct, is incomplete and limited by the ambiguities in this task and its connection to theory.

    2. Reviewer #1 (Public Review):

      Summary:

      Trutti and colleagues used 7T fMRI to identify brain regions involved in subprocesses of updating the content of working memory. Contrary to past theoretical and empirical claims that the striatum serves a gating function when new information is to be entered into working memory, the relevant contrast during a reference-back task did not reveal significant subcortical activation. Instead, the experiment provided support for the role of subcortical (and cortical) regions in other subprocesses.

      Strengths:

      The use of high-field imaging optimized for subcortical regions in conjunction with the theory-driven experimental design mapped well to the focus on a hypothetical striatal gating mechanism.

      Consideration of multiple subprocesses and the transparent way of identifying these, summarized in a table, will make it easy for future studies to replicate and extend the present experiment.

      Weaknesses:

      The reference-back paradigm seems to only require holding a single letter in working memory (X or O; Figure 1). It remains unclear how such low demand on working memory influences associated fMRI updating responses. It is also not clear whether reference-switch trials with 'same' response truly tax working-memory updating (and gate opening), as the working-memory content/representation does not need to be updated in this case. These potential design issues, together with the rather low number of experimental trials, raise concerns about the demonstrated absence of evidence for striatal gate opening.

      The authors provide a motivation for their multi-step approach to fMRI analyses. Still, the three subsections of fMRI results (3.2.1; 3.2.2; 3.3.3) for 4 subprocesses each (gate opening, gate closing, substitution, updating mode) made the Results section complex and it was not always easy to understand why some but not other approaches revealed significant effects (as the midbrain in gate opening).

      The many references to the role of dopamine are interesting, but the discussion of dopaminergic pathways and signals remains speculative and must be confirmed in future studies (e.g., with PET imaging).

    3. Reviewer #2 (Public Review):

      Summary:

      The study reported by Trutti et al. uses high-field fMRI to test the hypothesized involvement of subcortical structure, particularly striatum, in WM updating. Specifically, participants were scanned while performing the Reference Back task (e.g., Rac-Lubashevsky and Kessler, 2016), which tests constructs like working memory gate opening and closing and substitution. While striatal activation was involved in substitution, it was not observed in gate opening. This observation is cited as a challenge to cortico-striatal models of WM gating, like PBWM (Frank and O'Reilly, 2005).

      Strengths:

      While there have been prior fMRI studies of the reference back task (Nir-Cohen et al., 2020), the present study overcomes limitations in prior work, particularly with regard to subcortical structures, by applying high-field imaging with a more precise definition of ROIs. And, the fMRI methods are careful and rigorous, overall. Thus, the empirical observations here are useful and will be of interest to specialists interested in working memory gating or the reference back task specifically.

      Weaknesses:

      I am less persuaded by the more provocative points regarding the challenge it presents to models like PBWM, made in several places by the paper. As detailed below, issues with conceptual clarity of the main constructs and their connection to models, like PBWM, along with some incomplete aspects of the results, make this stronger conclusion less compelling.

      (1) The relationship of the Nir-Cohen et al. (2020) task analysis of the reference back task, with its contrasts like gate opening and closing, and the predictions of PBWM is far from clear to me for several reasons.

      First, contrasts like gate opening and gate closing make strong finite state assumptions. As far as I know, this is not an assumption of PBWM, certainly not for gate opening. At a minimum, PBWM is default closed because of the tonic inhibition of cortico-thalamic dynamics by the globus pallidus. Indeed, this was even noted in the discussion of this paper, which seems to acknowledge this discrepancy, but then goes on to conclude that they have challenged the PBWM model anyway.

      Second, as far as I know, PBWM emphasizes go/no-go processes around constructs of input- and output-gating, rather than state shifts between gate opening and closing. While this relationship is less clear in reference back, substituting task-relevant items into working memory does appear to be an example of input gating, as modeled by PBWM. Thus, it is not clear to me why the substitution contrast would not be more of a test of input gating than the gate opening contrast, which requires assumptions that are not clear are required by the model, as noted above.

      Third, PBWM relies on striatal mechanisms to solve the problem of selective gating, inputting, or outputting items in memory while also holding on to others. Selective gating contrasts with global gating, in which everything in memory is gated or nothing. The reference back task is a test of global gating. It is an important distinction because non-striatal mechanisms that can solve global gating, cannot solve selective gating. Indeed, this limitation of non-striatal mechanisms was the rationale for PBWM adding striatum. The connectivity of the striatum with the cortex permits this selectivity. It is not clear that the reference back task tests these selective demands in the first place. That limitation in this task was the rationale behind the recent Rac-Lubashevsky and Frank (2022) paper using the reference back 2 procedure that modifies the original reference back for selective gating.

      So, if the primary contribution of the paper is to test PBWM, as suggested by the first line of the abstract, then it is not clear that the reference back task in general, or the gate opening contrast in particular, is the best test of these predictions. Other contrasts (substitution), or indeed, tasks (reference back 2) would have been better suited.

      (2) In general, observations of univariate activity in the striatum have been notoriously variable in the context of WM. Indeed, Chatham et al. (2014) who tested working memory output gating - notably in a direct test of the predictions of PBWM - noted this variability. They too did not observe univariate activation in the striatum associated with selective output gating. Rather they found evidence of increased connectivity between the striatum and cortex during selective output gating. They argued that one account of this difference is that striatal gating dynamics emerge from the balance between the firing of both Go and NoGo cell populations that decide whether to gate or not. It is not always clear how this balance should relate to univariate activation in the striatum. Thus, the present study might also test cortico-striatal connectivity, rather than relying exclusively on univariate activation, in their test of striatal involvement in these WM constructs.

      (3) It is concerning that there was no behavioral cost for comparison switch vs. repeat trials. This differs from with prior observations from the reference back (e.g., Nir-Cohen et al., 2020), and in general, is odd given the task switch/cue interpretation component. This failure to observe a basic behavioral effect raises a concern about how participants approached this task and how that might differ from prior reports of the reference back. If they were taking an unusual strategy, it further complicates the interpretation of these results and the implications they hold for theory.

      In summary, the present observations are useful, particularly for those interested in the reference back task. For example, they might call into question verbal theories and task analyses of the reference back task that tie constructs like gate-opening to striatal mechanisms. However, given the ambiguities noted above, the broader implications for models like PBWM, or indeed, other models of working memory gating, are less clear.

    1. eLife assessment

      This important work addresses the relationship between the transdiagnostic compulsivity dimension and confidence as well as confidence-related behaviours like reminder setting. The relationship between confidence and compulsive disorders has recently received a lot of attention and has been considered to be a key cognitive change. The authors paired an elegant experimental design and pre-registration to give convincing evidence of the relationship between compulsivity, reminder setting, and confidence. Future work should clarify the link of their findings with prediction error-related processes to test whether they could be causally related to their results, and further clarify some of the implications for their findings and refine hypotheses about confidence-related cognitive changes with compulsivity and OCD.

    2. Reviewer #1 (Public Review):

      Summary:

      Boldt et al test several possible relationships between trandiagnostically-defined compulsivity and cognitive offloading in a large online sample. To do so, they develop a new and useful cognitive task to jointly estimate biases in confidence and reminder-setting. In doing so, they find that over-confidence is related to less utilization of reminder-setting, which partially mediates the negative relationship between compulsivity and lower reminder-setting. The paper thus establishes that, contrary to the over-use of checking behaviors in patients with OCD, greater levels of transdiagnostically-defined compulsivity predict less deployment of cognitive offloading. The authors offer speculative reasons as to why (perhaps it's perfectionism in less clinically-severe presentations that lowers the cost of expending memory resources), and set an agenda to understand the divergence in cognition between clinical and nonclinical samples. Because only a partial mediation had robust evidence, multiple effects may be at play, whereby compulsivity impacts cognitive offloading via overconfidence and also by other causal pathways.

      Strengths:

      The study develops an easy-to-implement task to jointly measure confidence and replicates several major findings on confidence and cognitive-offloading. The study uses a useful measure of cognitive offloading - the tendency to set reminders to augment accuracy in the presence of experimentally manipulated costs. Moreover, the utilizes multiple measures of presumed biases - overall tendency to set reminders, the empirically estimated indifference point at which people engage reminders, and a bias measure that compares optimal indifference points to engage reminders relative to the empirically-observed indifference points. That the study observes convergenence along all these measures strengthens the inferences made relating compulsivity to the under-use of reminder-setting. Lastly, the study does find evidence for one of several a priori hypotheses and sets a compelling agenda to try to explain why such a finding diverges from an ostensible opposing finding in clinical OCD samples and the over-use of cognitive offloading.

      Weaknesses:

      Although I think this design and study are very helpful for the field, I felt that a feature of the design might reduce the tasks's sensitivity to measuring dispositional tendencies to engage cognitive offloading. In particular, the design introduces prediction errors, that could induce learning and interfere with natural tendencies to deploy reminder-setting behavior. These PEs comprise whether a given selected strategy will be or not be allowed to be engaged. We know individuals with compulsivity can learn even when instructed not to learn (e.g., Sharp, Dolan, and Eldar, 2021, Psychological Medicine), and that more generally, they have trouble with structure knowledge (eg Seow et al; Fradkin et al), and thus might be sensitive to these PEs. Thus, a dispositional tendency to set reminders might be differentially impacted for those with compulsivity after an NPE, where they want to set a reminder, but aren't allowed to. After such an NPE, they may avoid more so the tendency to set reminders. Those with compulsivity likely have superstitious beliefs about how checking behaviors leads to a resolution of catastrophes, which might in part originate from inferring structure in the presence of noise or from purely irrelevant sources of information for a given decision problem.

      It would be good to know if such learning effects exist if they're modulated by PE (you can imagine PEs are higher if you are more incentivized - e.g., 9 points as opposed to only 3 points - to use reminders, and you are told you cannot use them), and if this learning effect confounds the relationship between compulsivity and reminder-setting.

      A more subtle point, I think this study can be more said to be an exploration than a deductive test of a particular model -> hypothesis -> experiment. Typically, when we test a hypothesis, we contrast it with competing models. Here, the tests were two-sided because multiple models, with mutually exclusive predictions (over-use or under-use of reminders) were tested. Moreover, it's unclear exactly how to make sense of what is called the direct mechanism, which is supported by partial (as opposed to complete) mediation.

    3. Reviewer #2 (Public Review):

      Summary:

      Boldt et al. investigated whether previously established relationships between transdiagnostic psychiatric symptom dimensions and confidence distortions would result in downstream influences on the confidence-related behaviour of reminder setting. 600 individuals from the general population completed a battery of psychiatric symptom questionnaires and an online reminder-setting task. In line with previous studies, individuals high in compulsivity (CIT) showed over-confidence in their task performance, whereas individuals high in anxious depression (AD) tended to be under-confident. Crucially, the over-confidence associated with CIT partially mediated a decreased tendency to use external reminders during task performance, whereas the under-confidence associated with AD did not result in any alteration in the external reminder setting. The authors suggest that metacognitive monitoring is impaired in CIT which has a knock-on effect on reminder setting behaviour, but that a direct link also exists between CIT and reduced reminder setting independently of confidence.

      Strengths:

      The study combines the latest advances in transdiagnostic approaches to psychopathology with a cleverly designed external reminder-setting task. The approach allows for investigation of what some of the downstream consequences associated with impaired metacognition in sub-clinical psychopathology may be.

      The experimental design and hypotheses were pre-registered prior to data collection.

      The manuscript is well written and rigorous analysis approaches are used throughout.

      Weaknesses:

      Participants only performed a single task so it remains unclear if the observed effects would generalise to reminder-setting in other cognitive domains.

      The sample consisted of participants recruited from the general population. Future studies should investigate whether the effects observed extend to individuals with the highest levels of symptoms (including clinical samples).

    1. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors designed an EEG experiment to investigate how listeners use temporal structure to optimise sensory detection. Listeners heard 2 seconds of noise and had to detect a faint tone in one of 3 temporal locations (equally spaced in time). In a minority of trials, no tone was presented. Focussing on these 'no tone' trials, the authors show that the EEG 'temporally tracks' the expected tone locations. This temporal tracking behaviour is also shown in a recurrent neural network trained on the same task. The authors interpret these findings as evidence of neural gain control in the service of sequential temporal anticipation.

      Strengths:

      The study uses an elegant experimental design and sophisticated EEG analyses. It is striking how clear the neural signatures are (of sequential expectation in the absence of sensory input). A further strength is the use of neural network modelling to elucidate the possible neural computations.

      Weaknesses:

      My first major comment concerns the theoretical implications of the study. An account based on gain control and temporal anticipation seems highly plausible. But are there other plausible accounts that the current data argue against? Or are there specific versions of gain control / temporal anticipation theories that the data supports and others that the data doesn't support? To develop the manuscript, I think the authors could relate their results in a more specific way to existing accounts, outlining not only what accounts their results favor but also which accounts their data falsify. In doing so I think the study will have a stronger influence on shaping the field.

      My second major comment concerns the consistent lag that is observed between tone location and neural/model responses. This would seem to be inconsistent with an anticipation account, which would instead predict zero or a negative lag. This should be discussed. While I agree the decrease in response magnitude that occurs with tone location is inconsistent with expectation violation, the positive lag that is observed seems more consistent with expectation violation than temporal anticipation/gain control.

      My third major comment is a suggestion to present some further analyses that I think will be informative. First is reporting more extensively the ERP results. This currently appears in one of the panels but there are no statistical tests reported in the main text and only the tone present data is shown. Given that expectation violation has been observed most consistently with ERPs, is there evidence of this in the 'no tone' trials and if so, does it correlate over participants with the power modulation effect or rate of false alarms? Doing this analysis will possibly be informative for assessing the plausibility of different functional accounts of the data e.g. expectation violation/prediction error. My second suggestion is to report the tone present trial data. When the tone is for example presented in the first location, does the response during tone locations 2 and 3 get suppressed? And does the same occur in the neural network model? If so, this would speak to a highly dynamic form of gain control (if the gain control account is correct).

    2. Reviewer #3 (Public Review):

      Summary:

      The study designs an EEG experiment to study how the brain better detects targets by exploiting information about when the target may appear. The study finds that the power fluctuations of alpha and beta oscillations can indicate the time intervals in which the target may appear. Furthermore, a RNN trained on the same task can also exploit such temporal information to better detect targets at the expected time intervals.

      Strengths:

      (1) The design of the experiment is elegant.

      (2) The EEG analysis approach is highly advanced.

      (3) The study combines human EEG experiments and computational modeling to address potential computational neural mechanisms.

      Weaknesses:

      The RNN is used both for modeling, which is commendable, and for simulating new psychophysics experiments, which can be problematic. In other words, it is very dangerous to predict human performance in a novel condition using RNN and assume that prediction is the same as the actual human performance. Comparing the RNN performance in two different noise conditions cannot directly "suggest that the 2 Hz neural modulation observed in Corrected Cluster 234 served to enhance sensory sensitivity to the target tone at the anticipated temporal locations, while selectively suppressing sensory noise during irrelevant noise periods." Here, much stronger evidence is to actually do the behavioral tests in two noise conditions in humans, but even that behavioral experiment cannot directly indicate the function of a neural response. In other words, the conclusion "additional analyses and perturbations on the RNNs indicated that the neural power modulations in the alpha-beta band resulted from selective suppression of irrelevant noise periods and heightened sensitivity to anticipated temporal locations" is not supported. The model does not have alpha or beta oscillations at all, which is OK, but directly concluding the function of alpha/beta oscillations based on the behavior of a model that does not have these oscillations is not appropriate.

      Relatedly, better detection of a target may reflect a change either in sensory processing or in decision-making, while the second possibility seems to be ignored.

      The results section has a lot of discussions, which should be moved to the discussion section.

    3. eLife assessment

      This valuable study provides insights into how the brain learns to better detect a target by predicting when the target may appear. Overall, solid evidence is provided that the power fluctuations of alpha- and beta-band oscillations can reflect the predicted occurrence time of the target, but some conclusions, especially ones related to the neural-network model and temporal gain control account, need further consideration. The study highlights an advanced EEG analysis approach as well as a close combination of human EEG analysis and computational modeling using recurrent neural networks.

    4. Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigated how the brain anticipates sequences of potential sensory events, using temporal predictability to enhance perception. To do so, they combined a tone detection task, electrophysiological recordings, and recurrent neural network models. The stimuli consisted of continuous white noise embedded with either a single tone presented at one of 3 equidistant (500ms) temporal locations, or no tone. The main analyses were carried out on no-tone trials, in which subjects only anticipated future events. First, a modulation power spectrum analysis revealed 4 frequency clusters, and a coupling analysis allowed the authors to group 3 of them together into cluster 234. The time course of the latter aligned with the temporal locations, reaching a local maximum following each of them. The power of cluster 234 during no-tone trials was positively correlated with behavioral performance (d') during tone trials, but not with false alarm rate. Then, the authors trained several continuous-time recurrent neural networks to model the experimental paradigm. After the networks were tuned to reflect the average d' of human subjects, a neural network analogue of EEG was extracted from the activity of neurons. The latter displayed a peak at 2Hz, its time course aligned with the temporal locations, reaching a local maximum both before and after each of them, and its d' score was higher for tones located at one of the temporal locations. A network trained with randomly occurring tones displayed no 2Hz activity and d' independent from tone location. Finally, the authors perturbed the excitatory/inhibitory ratio of neurons within the network, finding that more inhibition resulted in earlier peaks in the neural network activity.

      Strengths:

      (1) The experimental paradigm introduced in this study is original and well-built, allowing for the study of the targeted phenomenon. The fact that relevant neural signals were found despite the absence of sensory cues proves the setup is promising, opening the way for future works, playing with different parameters: number of tones, time between tones, sequence of temporal locations complexity, sequence of events...etc.

      (2) The statistical analysis was exhaustive, the authors consistently introduced controls for different conditions and alternative hypotheses, thoroughly explaining each step of the analysis as well as the choices behind them. The supplementary figures further helped understand the data and answer interrogation one might have. This comprehensive approach was well-appreciated.

      (3) The use of more biologically plausible networks, compared to traditional RNNs, to model the response of subjects is a promising approach, which can give clues as to the mechanism at play, but also make predictions that can then be proven (or disproven) by future experiments.

      The authors provided a work of good technical quality and reported their methods and findings transparently, making for good reproducibility and evaluation.

      Weaknesses:

      (1) The most glaring weakness of the paper lies in its interpretation of the different results. Conclusions are scattered around the paper, mostly unclear, and do not always make much sense with regard to the data. For example, the authors never address the absence of a peak before the first temporal location: why would subjects not "suppress" noise before the first temporal location given its (strong) predictability? Moreover, they immediately assume a functional role for the neural signature they found, as well as a direct link between the mechanisms at play in their RNN and the human brain, thus jumping to hasty and unreliable conclusions. The authors seemed to have a strong bias towards a hypothesis (predictive gain control) and tried to fit their data into it.

      - The authors cited very few relevant papers on related fields, notably on omission, and therefore did not build efficiently on previous works (e.g., Yabe, Raij, Schröger, Bekinschtein, Chait, Auksztulewicz...). Moreover, at several points in the paper, they make choices about their analysis or model without proper justification or cited sources, even when explicitly pointing to the existence of research supporting said choices.

      - Only a single electrode (out of 64) was used (Cz) to carry out every analysis. Without proper justification, this choice could be misinterpreted. Moreover, adopting instead a multivariate approach (incorporating all channels) would give more strength to the paper.

      - Overall, the observed electrophysiological results could be more simply explained by a mechanism akin to a go/no-go (a tone/no-tone) or omission response happening after each temporal location, as subjects have learned when to make that inference. The delay of the response with regards to temporal location would change due to error accumulation in time perception, rather than "the anticipation of the first temporal location facilitating the anticipation of the second", which makes little sense. Moreover, a response in Cz could be expected.

      - As for the results of RNN, not only is the analogy with actual neurophysiological activity limited, both in principle (simple E/I dynamics) and in implementation (inference is only done at the end of each trial), but the authors do not address the activity before the first temporal location, which is a major difference with human data. Their assumption that both RNN and cluster 234 are functionally related to gain control is thus further flawed. Moreover, the analysis of the RNN is lacking, for example, the authors did not compare false positive/negative of different delays, or analyzed Wout.

      - The phrasing and introduction of the paper are misleading, as confusion can arise between predicting a sequence of events (several events in a row) and predicting a single event appearing at different potential locations. It should be clarified that the paper does not address sequences of events at any point.

      It seems the authors already drew their conclusion beforehand and fit the data to match this bias. As such, the interpretation of the data is messy, flawed, and often hasty, drawing erroneous conclusions and parallels.

      Overall, the manuscript is of good technical quality and communicated results very transparently, but the authors seem to have a strong confirmation bias towards temporal anticipation and gain control, thus leading to flawed interpretations.

    1. eLife assessment

      This study presents useful, yet preliminary findings on the transcriptomic changes in cardiac lymphatic cells after myocardial infarction in mice. The conclusions of the authors remain uncertain as sample sizes for lymphatic endothelial cells are very low. The single-cell transcriptomic data were analyzed using solid advanced methodology and may be used as a starting point for future studies of the impact of lymphatic cells on heart disease.

    2. Reviewer #1 (Public Review):

      Summary:

      Assessment of cardiac LEC transcriptomes post-MI may yield new targets to improve lymphatic function. scRNAseq is a valid approach as cardiac LECs are rare compared to blood vessel endothelial cells.

      Strengths:

      Extensive bioinformatics approaches employed by the group.

      Weaknesses:

      Too few cells are included in scRNAseq data set and the spatial transcriptomics data that was exploited has little relevance, or rather specificity, for cardiac lymphatics. This study seems more like a collection of preliminary transcriptomic data than a conclusive scientific report to help advance the field.

    3. Reviewer #2 (Public Review):

      Summary:

      This study integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different time points post-MI. They identified four transcriptionally distinct subtypes of lymphatic endothelial cells and localized them in space. They observed that LECs subgroups are localized in different zones of infarcted heart with functions. Specifically, they demonstrated that LEC ca III may be involved in directly regulating myocardial injuries in the infarcted zone concerning metabolic stress, while LEC ca II may be related to the rapid immune inflammatory responses of the border zone in the early stage of MI. LEC ca I and LEC collection mainly participate in regulating myocardial tissue edema resolution in the middle and late stages post-MI. Finally, cell trajectory and Cell-Chat analyses further identified that LECs may regulate myocardial edema through Aqp1, and likely affect macrophage infiltration through the galectin9-CD44 pathway. The authors concluded that their study revealed the dynamic transcriptional heterogeneity distribution of LECs in different regions of the infarcted heart and that LECs formed different functional subgroups that may exert different bioeffects in myocardial tissue post-MI.

      Strengths:

      The study addresses a significant clinical challenge, and the results are of great translational value. All experiments were carefully performed, and their data support the conclusion.

      Weaknesses:

      (1) Language expression must be improved. Many incomplete sentences exist throughout the manuscript. A few examples: Lines 70-71: In order to further elucidate the effects and regulatory mechanisms of the lymphatic vessels in the repair process of myocardial injury following MI. Lines 71-73: This study, integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different time points after MI from publicly available data (E-MTAB-7895, GSE214611) in the ArrayExpress and gene expression omnibus (GEO) databases. Line 88-89: Since the membrane protein LYVE1 can present lymphatic vessel morphology more clearly than PROX1.

      (2) The type of animal models (i.e., permeant MI or MI plus reperfusion) included in ArrayExpress and gene expression omnibus (GEO) databases must be clearly defined as these two models may have completely different effects on lymphatic vessel development during post-MI remodeling.

      (3) Lines 119-120: Caution must be taken regarding Cav1 as a lymphocyte marker because Cav1 is expressed in all endothelial cells, not limited to LEC.

      (4) Figure 1 legend needs to be improved. RZ, BZ, and IZ need to be labeled in all IF images. Day 0 images suggest that RZ is the tissue section from the right ventricle. Was RZ for all other time points sampled from the right ventricular tissue section?

      (5) The discussion section needs to be improved and better focused on the findings from the current study.

    4. Reviewer #3 (Public Review):

      Summary:

      It has been demonstrated that cardiac lymphatics are essential for cardiac health and function. Moreover, post-myocardial infarction, targeting lymphatics by stimulating lymphangiogenesis has been shown to improve cardiac inflammation, fibrosis, and function. Then, the aim of this study was to evaluate the transcriptomic changes of cardiac lymphatic endothelial cells (LECs) after a myocardial infarction, which could reveal new therapeutic targets targeting lymphatic function. Moreover, investigating the cell-cell communication between lymphatic and immune cells would give critical information for a better understanding of the disease.

      Strengths:

      The use of scRNAseq data to evaluate LECs is an effective strategy considering the small proportion of LECs compared to blood endothelial cells. The extensive bioinformatic analysis used by the authors for three different data sets.

      Weaknesses:

      Among a total of 44,860 cells, only 242 LECs and 5,688 endothelial cells were identified. This small number of LECs is not representative and is insufficient to reliably distinguish four different clusters. The bioinformatic analysis is not supported by significant results in their in vivo and in vitro experiments.

    1. eLife assessment

      This study provides a valuable contribution to the development of small molecules that inhibit the aggregation of tau, a protein involved in several neurodegenerative diseases. The authors present convincing evidence that analogs of the plant alkaloid tryptanthrin can prevent the formation of larger aggregates by targeting the early stages of tau oligomerization. Nevertheless, further studies are needed to elucidate the precise mechanisms of action and to provide a detailed kinetic analysis. This work will be of interest to biochemists and biophysicists focused on designing small molecules to inhibit fibril formation.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper presents a class of small molecule inhibitors of tau aggregation which was discovered through a computational screen. Analogs were generated and tested for their ability to inhibit fibril formation.

      Strengths:

      A few of the analogs were found to have sub-stoichiometric activity. A comparison of unseeded and seeded aggregation kinetics suggests that these compounds preferentially target early-stage aggregation.

      Weaknesses:

      The authors state their interest is in finding compounds that target monomeric states of tau, but their only detection method is late-stage fibril formation. In this respect, they have not really defined a mechanism of action. They state their plan to use hydrogen-exchange mass spectrometry, but there are other techniques, such as single-molecule FRET and measurement of intramolecular reconfiguration. Additionally, there is information that can be gleaned from detailed kinetic modeling of the ThT kinetics to include monomer dynamics, formation of oligomers, and secondary nucleation of fibrils.

    3. Reviewer #2 (Public Review):

      Summary:

      James et al, in this study, build on their previous work investigating tau as a drug target. The authors identify tryptanthrin (TA) and its analogs as powerful inhibitors of tau4RD aggregation, even at low concentrations (nanomolar range). Interestingly, these analogs specifically target the initial stages of aggregation, where tau self-association first begins. This targeted approach effectively explains why such small amounts of tryptanthrin analogs are sufficient for inhibition. The study further shows that slight modifications to the structure of these molecules can significantly impact their effectiveness.

      Strengths:

      The experiments are well-designed and executed. The reviewer, in particular, appreciates the authors for the simple yet intelligent study design to understand the mechanism of aggregation inhibition by TA analogs.

      Weaknesses:

      Certain areas in the manuscript need clarifications, revisions, or additional supporting studies to strengthen the outcomes. For example, the authors mostly apply a single approach to assess tau aggregation or aggregation inhibition. Using additional techniques as suggested below will be helpful.

    1. eLife assessment

      This paper presents a valuable pipeline based on state-of-the-art analytical software that was used to study genetic pleiotropy between neuropsychiatric disorders. The presented evidence supporting the claims is convincing and now includes an appropriate comparison to previously published methods as well as a detailed exploration of the findings. The created pipeline can thus be used by researchers from diverse fields to study different combinations of diseases and traits.

    2. Reviewer #1 (Public Review):

      The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state-of-the-art, local genetic correlation approaches to further refine any relationships.

      Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy-to-follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them.

      The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard.

      There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper.

      Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible?

      These concerns were addressed in the revised version of this manuscript.

    3. Reviewer #2 (Public Review):

      Summary:

      Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out.

      The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci.

      Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits.

      Strengths:<br /> - All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision.<br /> - The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings.

      Weaknesses:<br /> - The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail.<br /> - The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation.

      These concerns were addressed in the revised version of this manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state of the art, local genetic correlation approaches to further refine any relationships. 

      Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy to follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them. 

      The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard. 

      There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper. 

      Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible? 

      We thank reviewer one for their considered review of this manuscript and for highlighting points that would benefit from further exploration. Itemised responses are provided below.

      (1) The reviewer noted that we did not adequately explain our choice of software for global and local genetic correlation analysis, and why we consider the techniques chosen as gold standard. We agree that the paper would benefit from clarification around this aspect of the study.

      Briefly, we firstly selected LAVA for the local genetic correlation analysis because it offers several advantages above competing software and was developed by a reputable team previously known for developing MAGMA, which is well-established in the statistical genetics field. In the manuscript (page 8), we added the following clarification: “LAVA was the most appropriate local genetic correlation approach for this study for several reasons. First, unlike SUPERGNOVA and rho-HESS, LAVA makes specific accommodations for analysis of binary traits. Second, other tools focus on bivariate correlation between traits whilst LAVA offers this alongside multivariate tests such as multiple regression and partial correlation, enabling rigorous testing of pleiotropic effects. Lastly, LAVA is shown to provide results which are less biased than those from other tools.”

      LDSC was selected for the global genetic correlation analysis because the software is well-established and likely the most widely adopted global genetic correlation tool. Reflecting its prevalence, the software is also compatible with LAVA, which adjusts for sample overlap based on the bivariate intercept estimate returned by LDSC. Since global genetic correlations were not the primary focus of this study, having been tested across several previous investigations (see response 2), we did not prioritise comparison of correlation estimates from LDSC against other available software. In the manuscript (pages 7-8) we now include the following statement: “[LDSC] was also applied to derive ‘global’ (i.e., genome-wide) genetic correlation estimates between trait pairs and estimate sample overlap from the bivariate intercept. The latter of these outputs was taken forward as an input for the local genetic correlation analysis using LAVA (see 2.2.2.2). Since global genetic correlation analysis across the traits studied here is not novel and associations reported in past studies are congruent across different tools, the compatibility between LDSC and LAVA motivated our use of LDSC for this analysis”.

      (2) The second comment was that the paper would be strengthened by contextualising our study with detail around what is previously known about associations between the studied traits. Accordingly, we have added clarifying text at the end of the introduction, stating: “although previous studies have performed global genetic correlation analyses between various combinations of these traits {references}, this is the first to compare them at a genome-wide scale using a local genetic correlation approach“. In the discussion, we link back to these studies, stating that “Through genetic correlation analysis, we replicated genome-wide correlations previously described between the studied traits {references}”.

      (3) The reviewer highlighted that the abstract as originally written may mislead or confuse the reader and we agree that clarity could be improved with some restructuring. This has now been revised and should read more logically.

      (4) They also enquired about our reasons for suggesting that the implication of distinct variants for each trait from a colocalisation analysis suggests a distinct causal mechanism. We thank them for this question as it encouraged us to reconsider how best to present the results of this analysis. To answer their question:

      It is certainly true that nearby but distinct variants can confer the same effect. In a scenario where multiple distinct variants result in the same effect and thus increase susceptibility towards two or more related phenotypes, you would expect to find evidence of association to each relevant variant in GWAS across these related traits (even if the magnitude of the associations differ). Where biological mechanisms are shared, post-GWAS finemapping analysis would be expected to yield credible sets overlapping across the traits, and likewise, colocalisation analysis should converge on a set of credible SNPs that are candidates for the shared effect. Where multiple distinct variants confer the same effect, you would expect to see separate fine-mapping credible sets for these distinct variants that colocalise pairwise between the jointly-affected traits. Generally, therefore, evidence supporting the two distinct variants hypothesis would suggest the role of two distinct mechanisms except when certain credible sets identified through fine-mapping converge on a colocalised effect.

      There is a further caveat which we also explored in response to Reviewer two: if a region includes long-spanning LD (and hence a larger number of variants are considered in the analysis), then the colocalisation analysis is more likely to favour the two distinct variants hypothesis since the probability of the variants implicated in both traits being shared decreases. It is likely that support for the two independent variants hypothesis is correct in most of the comparisons from this study that favour this conclusion. This is because, generally, the fine-mapping credible sets do not overlap across trait pairs (Figure S4) and consequently the colocalisation analysis does not find any support for the shared variant hypothesis. An exception is the analysis of PD and schizophrenia at the MAPT locus on chromosome 17. We have accordingly added the following clarification to the (page 18): “However, the colocalisation analysis will increasingly favour the two independent variants hypothesis as the number of analysed variants increases. Hence, the wide-spanning LD of this region may have obstructed identification of variants and mechanisms shared between the traits.”

      Reviewer #2 (Public Review): 

      Summary: 

      Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out. 

      The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci. 

      Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits. 

      Strengths: 

      - All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision. <br /> - The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings. 

      Weaknesses:

      - The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail. 

      - The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation. 

      We thank Reviewer two for their appraisal of this manuscript and kind comments regarding its strengths. We will now aim to address the identified weaknesses.

      (1) The reviewer comments that we did not adequately investigate why loci with causal variants identified in both traits all had positive local genetic correlations. We agree that it would be helpful to better understand the underlying reasons. To address this issue, we have added a new supplementary figure to compare the positive and negative local genetic correlation results (see Figure S2). In the main-text we add the following clarification. ”Although both positive and negative local genetic correlations passed the FDR-adjusted significance threshold, we observed only positive local genetic correlations in loci where fine-mapping credible sets were identified for both traits in the pair. This reflects that the correlation coefficients and variant associations from the analysed GWAS studies were generally stronger in the positively correlated loci (see Figure S2).”

      (2) The reviewer rightly suggests that the manuscript would benefit from an improved explanation of the somewhat inconsistent results for the colocalisation analysis of ALS and AD at the locus around the rs9275477 SNP from this work and a previous study.  We have now further investigated this and believe that the discrepancy results partly from an inherent empirical characteristic of the colocalisation analysis. We have explained this in the manuscript (page 22) as follows: “The previous study analysed a 200Kb window of over 2,000 SNPs around the lead genome-wide significant SNP from the ALS GWAS, rs9275477, and found ~0.50 posterior probability for each of the shared and two independent variant(s) hypotheses. The current analysis used 475 SNPs occurring within a semi-independent LD block of ~50kb in this locus. Since the posterior probability of the two independent variants hypothesis (H3) increases exponentially with the number of variants in the region whilst the shared variant hypothesis (H4) scales linearly, it is expected that our analysis would give stronger support for the latter. Given that the previous study defined regions for analysis based on an arbitrary window of ±100kb around each lead genome-wide significant SNP from the ALS GWAS and we defined each analysis region based on patterns of LD in European ancestry populations, it is reasonable to favour the current finding.”

    1. eLife assessment

      This study presents high-quality experiments and data analysis of C. elegans locomotion for spontaneous exploration as well as in the presence of an aversive stimulus. This important work shows that the activation of distinct turn types enhances escape performance as well as exploration. The strength of the evidence is still incomplete, particularly regarding optimal exploration and the identification of the range of the aversive stimulus at the boundary of the arena. The work will be of interest to a broad audience extending from movement ecology, to the biology of Caenorhabditis elegans.

    2. Reviewer #1 (Public Review):

      This is an interesting and thorough paper describing the modes of locomotion of the nematode C. elegans in the context of random exploration or response to an aversive stimulus. The authors collect extensive statistics on various locomotor states and compare findings to a minimal mathematical model inspired by the data. Their data reveal biases in two modes of turning- gradual and sharp- which define the path structure of the animal moving on an agar plate. The authors also find that animals tend to overcome inherent anatomical/physiological biases to locomotion when escaping aversive stimuli.

      Understanding animal navigation is a window for revealing efficient algorithms for exploration of space, and also allows testing of the extent to which we understand how the nervous system produces specific behaviors. This paper adds important analysis towards these goals. I have a couple of comments that may be worth considering:

      (1) The authors place a circular barrier of SDS near the edges of their plates and assume that this aversive stimulus is only sensed when the animal is near the barrier. However, it is possible that the SDS diffuses enough into the interior of the plate to affect the navigation statistics. In this case, the data they have accumulated may in fact be some sort of combination of exploratory locomotion and a general background SDS aversive stimulus. Can the authors control for this? Perhaps test the plates at different distances and times for SDS diffusion? Or replace the barrier with a physical one and not a chemical one?

      (2) The authors do not look at mutants or perturb the physiology in defined ways relevant to the locomotion being studied to test their model. Specifically, it would be of interest to identify neural circuits that govern some of the parameters in the model. Although the authors bring this up in their Discussion section, it seems appropriate for this paper, as it would considerably bolster the impact of the work.

    3. Reviewer #2 (Public Review):

      Summary:

      Turning behavior plays a crucial role in animal exploration and escape responses, regardless of the presence or absence of environmental cues. These turns can be broadly categorized into two categories: strong reorientations, characterized by sudden changes in path directionality, and smooth turns, which involve gradual changes in the direction of motion, leading to sinuosity and looping patterns. One of the key model animals to study these behaviors is the nematode Caenorhabditis elegans, in which the role of strong reorientations has been thoroughly studied. Despite their impact on trajectories, smooth turns have received less attention and remain poorly understood. This study addresses this gap in the literature, by studying the interplay between smooth turns and strong reorientations in nematodes moving in a uniform environment, surrounded by an aversive barrier. The authors use this set-up to study both exploration behavior (when the worm is far from the aversive barrier) and avoidance behavior (when the worm senses the aversive barrier). The main claims of the paper are that (1) during exploratory behavior, the parameters governing strong reorientations are optimized to compensate for the effect of smooth turns, increasing exploration efficiency, and (2) during avoidance, strong reorientations are biased towards the side that maximizes escape success. To support these two claims, the paper presents a detailed quantitative characterization of the statistics of smooth turns and strong reorientations. These results offer insights that may interest a diverse audience, including those in movement ecology, animal search behavior, and the study of Caenorhabditis elegans. In our opinion, the experimental work and data analysis are of the highest quality, resulting in a very clean characterization of C. elegans' turning behavior. However, the experimental design and data analyses presented are not fully aligned with some of the central conclusions drawn, and in particular, we believe that further work is needed to fully support the claim that strong reorientations are optimized to increase exploration efficiency.

      Strengths:

      The authors have addressed important questions in movement ecology through hypothesis-driven experiments. The choice of C. elegans as a model organism to investigate the impact of turning dynamics on escape and exploration is well-justified by its limited repertoire of strong reorientation behaviors and consistent turning bias across strains and individuals. The quality of the experimental data is very high, using state-of-the-art techniques, and a set-up where a robust and reproducible avoidance response can be studied. The data analysis benefits from state-of-the-art techniques and a deep understanding of C. elegans' behavior, resulting in a very clean and very clear set of results. We particularly appreciated the use of a ventral/dorsal reference system (rather than a left/right one), which is more natural and insightful. As a result, the paper presents one of the best characterizations of C. elegans sharp turning behavior published to date. We find that the claim that strong reorientations are chosen in a way that optimizes avoidance behavior is solid and well-supported. The manuscript is well-written and maintains a coherent line of reasoning throughout.

      Weaknesses:

      Our primary concerns revolve around the significance and rigor of the research on exploratory behavior. First, we believe that the experimental arena was too small for accurately observing the unfolding of exploration. The movement of assayed animals was clearly impaired by boundary effects, which obscured key elements of C. elegans exploratory behavior such as the mean square displacement or large-scale trajectory structures emerging from curvature bias. Second, we think that the proof that strong reorientations are optimized to maximize exploration performance is too indirect: it relies on a particular model with some unrealistic assumptions and lacks a quantification of the gains provided by the optimization to the individuals. We believe that a more thorough and direct analysis would be needed to fully support the claim.

    1. eLife assessment

      The work provides valuable genomic resources to address the endocrine control of a life cycle transition in the Malabar grouper fish. The revised manuscript is more solid and the resources and experimental data help to build up a meaningful biological understanding of thyroid signaling in grouper fish.

    2. Reviewer #1 (Public Review):

      Summary and strength:<br /> The authors undertook to assemble and annotate the genome sequence of the Malabar grouper fish, with the aim to provide molecular resources for fundamental and applied research. Even though this is more mainstream, the task is still daunting and labor intensive. Currently, high quality and fully annotated genome sequences are of strategic importance in modern biology. The authors make use of the resource to address the endocrine control of an ecologically and developmentally relevant life cycle transition, metamorphosis. As opposed to amphibian and flat fish where body plan changes, fish metamorphosis is anatomically more subtle and much less known, although it is clear that thyroid hormone (TH) signaling is a key player. The authors thus provide a repertoire of TH-relevant gene expression changes during development and across post-embryonic transitions and correlate developmental stages with changes of gene expression. Overall, this work represents a significant advance in the field.

      Fish 'metamorphosis' is well known because it is not as spectacular as amphibians. This work clearly provides technical and theoretical resources to address in a more systematic manner the molecular changes occurring during development and post-embryonic transitions. Heterochrony is a major source of functional and life cycle diversity in fish, which blurs our anatomy-based understanding of fish biology, and has a direct impact on the protocols and rearing procedures used to produce live stocks. This work illustrates how, by using genomics coupled to simple experimental endocrinology, one directly addresses these challenges.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Responses to recommendations

      Reviewer #1 (Recommendations For The Authors):

      Describe more precisely how gene expression graphs are built (tissues, reads counts). For example, how were read counts normalized? Were they from DESeq2 data, which only works by comparing two samples? If so, all samples should be independently compared to a reference and the normalized expression value of the reference will change from sample to sample... thus introducing a pure technical artifact.

      We have added additional information about the normalisation method to the

      Material and Methods section (Lines 597-598: “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.”) and figure legends

      (lines 247, 286, 372, 404: “Gene expression data was generated from whole fish.

      Expression levels were derived from DESeq2 normalised gene counts.”) to address this recommendation. 

      DESeq2 provides a reference independent normalisation through a median of ratios method (a good explanation can be found here:

      https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.h tml). The normalised expression values are independent of any reference, and therefore will not change from sample and sample as suggested in this comment. In contrast, the pairwise comparisons are done when analysing significantly differentially expressed genes between two treatments using a Wald test, which is done against a reference and generates log2 fold change information and p-values.; however, this is different to the normalisation we described above.

      Provide bioinformatics workflows and, if possible, the set of parameters used, the computing resources, etc. Were some assembly finishing steps carried out (by long-range PCR?) and experimental validations (especially for allelespecific transcripts, by conventional RT-PCR based on diagnostic mutations)?

      We have added additional information on the bioinformatics workflows where required, including parameters used (Lines 530, 536, 549-551, and 574-583.). No finishing steps other than HiC scaffolding were performed. No allele-specific analysis was done as part of this manuscript.

      To further improve transparency, we have also uploaded all the scripts used for this study to https://github.com/R-Huerlimann/Malabar_grouper_genome and the gene models and functional annotation to https://figshare.com/projects/Malabar_grouper_Epinephelus_malabaricus_genome_ annotation/199909. This information has been added to the manuscript in lines 600601 and 609-611.

      Reviewer #3 (Recommendations For The Authors):

      General author response:

      All the recommendations of this reviewer are very relevant and would certainly provide a lot of information, but they are constituting a full project in themselves as they would imply establishing this grouper species as an experimental model in our lab. Currently we only have access to the larval and juvenile stages via a collaboration with the Okinawa Prefectural Sea Farming Center, which is an hour drive from our lab, and is limited to the grouper spawning season. If we want to do all what is suggested, we need to have a regular and easy access to the fishes. This would require establishing this model in our marine station, which is not possible due to space and time issues. These groupers grow to a very large size (1-2 m in length, and up to 150 kg in weight) and only mature into males after > 6 years.

      First and foremost, I would advise the authors to extend their TH and cortisol levels measurements to the entire developmental time considered in their analysis.

      For the reasons stated above we could not perform these experiments. We must emphasize that the data regarding TH are available for a closely related species (e.g., Epinephelus coioides, de Jesus et al. 1998) and there is no reason to think that the situation will be drastically different in E. malabaricus. In addition, given that we have now studied several coral reef fish species in the same context (clownfish, surgeonfish, damselfish, gobies) we observed that the transcriptomic data are more robust, more sensitive, and more precise than hormone measurements. 

      Consider carrying out in situ hybridisation of TSH with putative CRH receptors to determine if thyrotrophin could be competent to respond to HPA axis signals.

      We agree studying the interplay between corticoids and thyroid hormones at the neuroendocrine level would be desirable and we fully agree with the experiment suggested by the reviewer, but this is impossible in our current situation. We are not working with an establish animal model like zebrafish or Xenopus, but with a large, long-lived marine fish that reproduces in spawning aggregations and whose husbandry is notoriously difficult.

      Consider conducting cortisol treatment experiments to functionally determine if indeed cortisol is involved in grouper metamorphosis.

      We tried to do TH and cortisol treatments specifically on the early larval stages corresponding to the early TH peak to see how this would impact the development of the fin spines, but our trials were unsuccessful. The larvae at that stage are extremely fragile and even putting them into small volumes of treatment drugs induced massive mortalities. Again, this would mean establishing this grouper species as a model organism and would require a massive effort to improve larval rearing as discussed above. We feel that our data stands on its own in the meantime and adds valuable information to the existing literature by studying a rarely investigated species.

      Responses to comments

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We made the suggested change of adding “sequence” in lines 32 and 121. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. 

      As for panel E of figure 2, it is not missing. The panel is located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced. The individual tissues derived from juvenile fish were used for the genome annotation only, using ISOseq. The whole larval fish were used for the developmental analysis using RNAseq, as well as the genome annotation. We have added additional information in the figures and text that the results shown are from whole larvae, and added more detail to the material and methods section about which type of sample was analysed in which way.

      Specifically, we have added “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.” to lines 597-598 in the Material and Methods section, “Gene expression data was generated from whole larvae.” to line 191, and “Gene expression data was generated from whole fish. Expression levels were derived from DESeq2 normalised gene counts.” to the figure legends in lines 247, 286, 372, 404). Additionally, we have added clarifications in lines 489, 497, 530, and 536. 

      The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We have added additional information on parameters used in the genome assembly, annotation and transcriptome analysis in lines 549-551, 577, 579, 580, and 582. Additionally, we have uploaded all scripts to github as outlined in the Code and Data Availability section (lines 599-614).

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material.

      Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications.

      This would really help (general) readers.

      The T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormonemediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      The large number of differentially expressed genes is due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60. 

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). However, normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/):

      “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods: an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the existence of an early larval developmental step also involving TH and corticosteroids and 2) the possible interaction of corticoids and TH during metamorphosis. This is a question that is certainly not settled yet in teleost fishes and which is of great interest.

      Especially 1) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of this study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      We acknowledge the descriptive nature of the data and the lack of functional experiments in the Discussion in lines 443 to 445: “This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians, but functional experiments need to be conducted to confirm this hypothesis.” As stated above doing such functional experiment would require establishing the grouper as an experimental model in our husbandry, which currently is not possible due to the large size of the adult fish.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).  

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remains a contentious issue. We have clarified the Discussion on this point (lines 375-376, lines 439-464).

      We wrote that “There is contrasting evidence of communication between these two pathways during teleost fish larval development with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish [26], golden sea bream [64] and silver sea bream [65]. Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder[27]. It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner [66]. On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp decreases cortisol levels[67], whereas cortisol exposure decreases TH levels in European eel [68]. Given this scattered evidence, the existence of a crosstalk active during teleost larval development and metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis is activated during both early development and metamorphosis and that cortisol synthesis is activated during early development. This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians [25], but functional experiments need to be conducted to confirm this hypothesis.” In the revised manuscript, we have also added the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field. 

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiments in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period. 

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. Therefore, we have added the following sentence in lines 212 to 216: “The respective order of appearance of TSH and Tg (TSH at D32, Tg after) is consistent with what we would expect but a bit later than expected given the morphologicl transformation. It would be interesting to revisit this in a future series of experiments, with tighter temporal sampling to study how gene expression and morphological transformation aligned.“.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption (Figure 4 in de Jesus et al, 1998), 2) there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. This clearly shows that our inference is correct. Additionally, we would like to reemphasize that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. In order to clarify our message further, we have changed some of the mentions of

      “metamorphosis” to “larval development” throughout the manuscript and added other improvements to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).  

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here. 

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We have improved the clarity of the manuscript in several places to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared lines 224-225 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript

      that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction. We have added corresponding statements in the abstract (lines 39-43) and discussion (lines 447 to 449).

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We have therefore made changes throughout the Introduction and Discussion to make this clearer.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). The main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we have made changes to the introduction and discussion.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message was unclear. We have addressed these points in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We also made sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. eLife assessment

      This important study presents the structure of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state, providing the first description of the architecture of this family of integral membrane enzymes, and revealing the mode of acetyl-CoA binding. The structural work is convincing, with a high resolution and isotropic single-particle cryoEM map and an atomic model that is well-justified by the density map, with strong and convincing density for the acetyl-CoA ligand. However, experimental support for the molecular mechanism of the HS acetylation reaction and the impact of disease-causing mutations is incomplete. This work will be of interest to biochemists and structural biologists studying the structure and function of integral membrane enzymes, as well as those interested in genetic diseases resulting from mutations in this family of enzymes, such as mucopolysaccharidosis IIIC (MPS III-C).

    2. Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoA-bound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

    3. Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occur has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as, nonsense mutations, splice-site variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.<br /> This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles, and a density assigned to acetyl-CoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The paper describes a structure of HGSNAT a member of the transmembrane acyl transferase (TmAT) superfamily. The high-resolution of a HGSNAT bound to acetyl-CoA is important for our understanding of HGSNAT mechanism. The density map is of high-quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC.

    4. Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer, and define the architecture of the alpha- and beta-GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating building of a high-confidence atomic model. Importantly, the density for the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      Weaknesses:

      While the structural data for the state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      A weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

    5. Author response:

      The following is the authors’ response to the original reviews.

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function. The authors would need to establish an in vitro assay using purified protein and assess the level of Acetyl-CoA in the reaction (there are commercial kits and a long list of literature showing how to measure this). They could also follow the HS acetylation reaction by e.g. HPLC-MS or NMR (among other methods).

      The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript. However, we see that acetyl-CoA was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. Upon dialysis, we see release of acetyl-CoA from the protein, which we have confirmed by LC-MS analysis (Fig S9). We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We attempted measuring HGSNAT catalyzed reaction by monitoring decrease in acetyl-CoA in presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA in the presence of HGSNAT-ACO complex (blue) and apo HGSNAT (red); the difference compared to the ACO standard (gray) was not significant. While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active.

      Author response image 1.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 mM acetyl-CoA was measured in presence of 10 mM D-glucosamine and 30 nM HGSNAT at pH 7.5.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer? The speculative nature of this assumption needs to be clearly acknowledged throughout the manuscript and discussed in more detail. The authors could use HDX-MS or introduce cysteine residues in the hypothetical inward- and outward-facing cavities and test accessibility by incubating the purified protein with maleimides or other agents reacting with free cysteine.

      We thank the reviewers for this insightful critique. Yes, the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. We also agree with the reviewer that HDX-MS could be the best way to monitor the substrate-induced conformational dynamics within HGSNAT experimentally. In the absence of data supporting large movements during the acetyl transfer reaction, figure 5 is speculative. We have now edited Figure 5 in the revised version of the manuscript based on the observations we made in this study.

      (3) The acetyl-CoA-bound state is described as the open-to-lumen state. Indeed, from Figure 1C, the lumen opening appears much larger than the cytosol opening. Is there any small tunnel that connects the substrate site to the cytosol? In other words, is this state accessible to both the lumen and the cytosol, albeit with a larger opening toward the lumen? This question arises because, in Figure S5, the tunnel calculated by MOLE seems to also connect to the cytosol.

      Yes, it is likely that the ACOS is accessible via lumen and cytosol to varying degrees, as evidenced by MOLE prediction. However, binding of the bulky nucleoside head group of acetyl-CoA at ACOS blocks the cytosolic entrance in the confirmation discussed in this manuscript. MOLE prediction was performed on a structure devoid of acetyl-CoA, and it is possible that the protein doesn’t essentially undergo isomerization between open-to-lumen and open-to-cytosol confirmations during acetyl transfer. Likely, ACOS is always accessible from both the lumen and cytosol, but depending on the substrates or products bound, the accessibility could be limited to either the lysosomal lumen or cytosol. We have rewritten all the statements mentioning an open-to-lumen confirmation to reflect this argument.

      (4) The authors state, "Interestingly, in most of the detergent conditions we tested, HGSNAT was predominantly dimeric (Fig S1C-H)," and also mention, "In all the detergents we tested, HGSNAT eluted as a dimer, a testament to the extensive side-chain interaction network." The dimerization is said to be mediated by a disulfide bond. I would be surprised if the detergents the authors tested could break a disulfide bond. Therefore, can this observation truly serve as a testament to an "extensive" side-chain interaction network?

      We agree with the reviewer that detergents are unlikely to break a disulfide bond. To address this comment, we generated a C334A mutant of HGSNAT and extracted it from cells in 1% digitonin. It is still expressed as a dimer (Fig S8E). However, upon heating the detergent solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Fig S8I and S8K). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer.

      (5) Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution. This uncertainty needs to be clearly described in the manuscript text. Performing additional mutagenesis experiments to test key hypotheses, or further discussing relevant data from the literature, would strengthen the manuscript.

      We agree with the reviewer on the lack of supporting evidence for the mechanistic models proposed in Fig 5. They were made based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al. (1983), Miekle et al. (1995), and Fan et al. (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. We have edited Figure 5 in the revised version of the manuscript. In addition, we have also performed mutagenesis analysis to study the stability of mutants (Fig S8) and performed LC-MS analysis to identify endogenously bound acetyl-CoA (Fig S9) to strengthen parts of the manuscript. We have discussed our findings in the results and modified the discussion according to these suggestions.

      (6) It is discussed that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case. Providing more information, ideally involving additional experimental work, would strengthen this aspect of the mechanism that is proposed. This would require establishing an in vitro assay, as described in 1).

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We generated N258I and H269A mutants of HGSNAT and analyzed their stability. We noticed a greater destabilization in N258I compared to H269A (Fig S8). We believe this is because of the loss of ability to bind acetyl-CoA, as the TMs around a catalytic core of the protein in our cryo-EM structure were stabilized by interactions with acetyl-CoA. Recently, Xu et al. (2024, Nat Struct Mol Biol) suggested that they do not observe acetylated histidine in their structure. However, our structure and that reported by Xu et al. (2024) are obtained at cytosolic pH. Perhaps, acetylation of H269 occurs at acidic lysosomal pH. Extensive structural and catalytic investigation of HGSNAT at low pH is required to rule out H269 acetylation as a step in the HGSNAT catalyzed reaction.

      (7) In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other. The authors could explore if an in vitro experimental measurement of protein activity would provide any information in this regard.

      We agree with the reviewer that a more detailed kinetic analysis is necessary to define the bisubstrate reaction mechanism of HGSNAT. All the existing structural data on two isoforms of HGSNAT is obtained at basic pH. As a result, the existing structures do not unambiguously demonstrate the bisusbtrate mechanism of HGSNAT. We believe low pH structural characterization and a detailed kinetic and structural characterization of HGSNAT in membrane mimetics like nanodiscs could provide more insights into the mechanism. However, these studies are a future undertaking and are not a part of this manuscript.

      (8) Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions. It would be ideal if an additional test for these predictions were included in the manuscript. The authors could follow the unfolding of purified mutants by SEC, FSEC, or changes in intrinsic fluorescence to assess protein stability.

      As suggested here, we prepared HGSNAT MPSIIIC variants and tested their expression and stability (please see Fig S8). These results have been included in the revised version of the manuscript.

      (9) Some sidechains that have quite strong sidechain density are missing atoms. I would be particularly careful with omitting sidechains that pack in the hydrophobic core, as this can tend to artificially reduce the clash score. Check F81, L62, P91 and V87, for example.

      We have revisited the modeling of these regions and deposited new coordinates.

      (10) W316 seems to have the wrong rotamer.

      This has been corrected in the new coordinate file that has been released.

      (11) N134 and N433 seem to have extra density. Are these known glycosylation sites?

      As per Hrebicek M. et al., 2006 and Feldhammer M. et al., 2009, there are five predicted glycosylation sites: N66, N114, N134, N433, and N602. However, we see evidence for NAG density at N114, N134, and N433. These have now been modeled in the structure.

      (12) At the C-terminal residue (Ile-635), the very C-terminal carboxylate is modeled pointing to a hydrophobic environment. It seems more likely to me that the Ile sidechain is packing here, with the C-terminal carboxylate facing the solvent.

      Thank you for pointing this out. We have edited the orientation of the Ile sidechain accordingly.

      Presentation and wording of results/methods:

      - Figure S3 legend "At places with missing density, the side chains were trimmed to C- alpha" - this is incorrect, I think the authors mean C-beta.

      We have corrected this error in the revised version of the manuscript.

      - Figure S3 legend - the authors refer to a gray mesh, where a transparent surface is displayed.

      Thanks for pointing this error out. We have corrected this in the revised version.

      - Some colloquial/vague wording in the main text (a lot of sentences starting with "Interestingly, ...". Making the wording more specific would help the reader I think.

      We have edited out ‘interestingly’ from the document and have re-written parts of the manuscript, per reviewers’ suggestion, for brevity.

      - Figure S2 legend, "throughout the processing workflow the resolution of luminal domain was used as a guidepost" - it is not entirely clear to me what this means in this context, perhaps revise the wording?

      We have rephrased this line in the revised draft of the manuscript.

      - Figure S2 and methods, Local refinements of LD and TMD are mentioned, but not indicated on the processing workflow.

      We have included a new Fig S2 & edited the legend, including these changes, per the reviewers’ suggestions.

    1. eLife assessment

      This meta-analysis presents valuable findings that reexamine the function of butterfly eyespots in predator avoidance and report for conspicuousness over mimicry. The analysis is robust, but the evidence supporting the importance of conspicuousness is incomplete due to the limitations of the literature, and this debate would benefit from additional experiments that would strengthen these claims. This paper is of interest to evolutionary biologists and ecologists working on the evolution of morphology and predator-prey interactions.

    2. Reviewer #1 (Public Review):

      Summary:

      The question of whether eyespots mimic eyes has certainly been around for a very long time and led to a good deal of debate and contention. This isn't purely an issue of how eyespots work either, but more widely an example of the potential pitfalls of adopting 'just-so-stories' in biology before conducting the appropriate experiments. Recent years have seen a range of studies testing eye mimicry, often purporting to find evidence for or against it, and not always entirely objectively. Thus, the current study is very welcome, rigorously analysing the findings across a suite of papers based on evidence/effect sizes in a meta-analysis.

      Strengths:

      The work is very well conducted, robust, objective, and makes a range of valuable contributions and conclusions, with an extensive use of literature for the research. I have no issues with the analysis undertaken, just some minor comments on the manuscript. The results and conclusions are compelling. It's probably fair to say that the topic needs more experiments to really reach firm conclusions but the authors do a good job of acknowledging this and highlighting where that future work would be best placed.

      Weaknesses:

      There are few weaknesses in this work, just some minor amendments to the text for clarity and information.

    3. Reviewer #2 (Public Review):

      Many prey animals have eyespot-like markings (called eyespots) which have been shown in experiments to hinder predation. However, why eyespots are effective against predation has been debated. The authors attempt to use a meta-analytical approach to address the issue of whether eye-mimicry or conspicuousness makes eyespots effective against predation. They state that their results support the importance of conspicuousness. However, I am not convinced by this.

      There have been many experimental studies that have weighed in on the debate. Experiments have included manipulating target eyespot properties to make them more or less conspicuous, or to make them more or less similar to eyes. Each study has used its own set of protocols. Experiments have been done indoors with a single predator species, and outdoors where, presumably, a large number of predator species predated upon targets. The targets (i.e, prey with eyespot-like markings) have varied from simple triangular paper pieces with circles printed on them to real lepidopteran wings. Some studies have suggested that conspicuousness is important and eye-mimicry is ineffective, while other studies have suggested that more eye-like targets are better protected. Therefore, there is no consensus across experiments on the eye-mimicry versus conspicuousness debate.

      The authors enter the picture with their meta-analysis. The manuscript is well-written and easy to follow. The meta-analysis appears well-carried out, statistically. Their results suggest that conspicuousness is effective, while eye-mimicry is not. I am not convinced that their meta-analysis provides strong enough evidence for this conclusion. The studies that are part of the meta-analysis are varied in terms of protocols, and no single protocol is necessarily better than another. Support for conspicuousness has come primarily from one research group (as acknowledged by the authors), based on a particular set of protocols.

      Furthermore, although conspicuousness is amenable to being quantified, for e.g., using contrast or size of stimuli, assessment of 'similarity to eyes' is inherently subjective. Therefore, manipulation of 'similarity to eyes' in some studies may have been subtle enough that there was no effect.

      There are a few experiments that have indeed supported eye-mimicry. The results from experiments so far suggest that both eye-mimicry and conspicuousness are effective, possibly depending on the predator(s). Importantly, conspicuousness can benefit from eye-mimicry, while eye-mimicry can benefit from conspicuousness.

      Therefore, I argue that generalizing based on a meta-analysis of a small number of studies that conspicuousness is more important than eye-mimicry is not justified. To summarize, I am not convinced that the current study rules out the importance of eye-mimicry in the evolution of eyespots, although I agree with the authors that conspicuousness is important.

    1. eLife assessment

      This important study utilizes humanized mice, in which human immune cells are introduced into immune-deficient mice, to provide solid evidence that two helper CD4 T-cell subsets, T-follicular helper (Tfh) and T-peripheral helper (Tph) cells, are able to drive both autoantibody production and induction of autoimmunity. The work will be of broad interest to medical scientists engaged in deciphering how human immune cells mediate immune responses and contribute to the development of autoimmune diseases.

    2. Reviewer #1 (Public Review):

      Summary:

      As our understanding of the immune system increases it becomes clear that murine models of immunity cannot always prove an accurate model system for human immunity. However, mechanistic studies in humans are necessarily limited. To bridge this gap many groups have worked on developing humanised mouse models in which human immune cells are introduced into mice allowing their fine manipulation. However, since human immune cells will attack murine tissues, it has proven complex to establish a human-like immune system in mice. To help address this, Vecchione et al have previously developed several models using human cell transfer into mice with or without human thymic fragments that allow negative selection of autoreactive cells. In this report they focus on the examination of the function of the B-helper CD4 T-cell subsets T-follicular helper (Tfh) and T-peripheral helper (Tph) cells. They demonstrate that these cells are able to drive both autoantibody production and can also induce B-cell independent autoimmunity.

      Strengths:

      A strength of this paper is that currently there is no well-established model for Tfh or Tph in HIS mice and that currently there is no clear murine Tph equivalent making new models for the study of this cell type of value. Equally, since many HIS mice struggle to maintain effective follicular structures Tfh models in HIS mice are not well established giving additional value to this model.

      Weaknesses:

      A weakness of the paper is that the models seem to lack a clear ability to generate germinal centres. For Tfh it is unclear how we can interpret their function without the structure where they have the greatest influence. In some cases, the definition of Tph does not seem to differentiate well between Tph and highly activated CD4 T-cells in general.

    3. Reviewer #2 (Public Review):

      Summary:

      Humanized mice, developed by transplanting human cells into immunodeficient NSG mice to recapitulate the human immune system, are utilized in basic life science research and preclinical trials of pharmaceuticals in fields such as oncology, immunology, and regenerative medicine. However, there are limitations to using humanized mice for mechanistic analysis as models of autoimmune diseases due to the unnatural T cell selection, antigen presentation/recognition process, and immune system disruption due to xenogeneic GVHD onset.

      In the present study, Vecchione et al. detailed the mechanisms of autoimmune disease-like pathologies observed in a humanized mouse (Human immune system; HIS mouse) model, demonstrating the importance of CD4+ Tfh and Tph cells for the disease onset. They clarified the conditions under which these T cells become reactive using techniques involving the human thymus engraftment and mouse thymectomy, showing their ability to trigger B cell responses, although this was not a major factor in the mouse pathology. These valuable findings provide an essential basis for interpreting past and future autoimmune disease research conducted using HIS mice.

      Strengths:

      (1) Mice transplanted with human thymus and HSCs were repeatedly executed with sufficient reproducibility, with each experiment sometimes taking over 30 weeks and requiring desperate efforts. While the interpretation of the results is still debatable, these description is valuable knowledge for this field of research.

      (2) Mechanistic analysis of T-B interaction in humanized mice, which has not been extensively addressed before, suggests part of the activation mechanism of autoreactive B cells. Additionally, the differences in pathogenicity due to T cell selection by either the mouse or human thymus are emphasized, which encompasses the essential mechanisms of immune tolerance and activation in both central and peripheral systems.

      Weaknesses:

      (1) In this manuscript, for example in Figure 2, the proportion of suppressive cells like regulatory T cells is not clarified, making it unclear to what extent the percentages of Tph or Tfh cells reflect immune activation. It would have been preferable to distinguish follicular regulatory T cells, at least. While Figure 3 shows Tregs are gated out using CD25- cells, it is unclear how the presence of Treg cells affects the overall cell population immunogenic functionally.

      (2) The definition of "Disease" discussed after Figure 6 should be explicitly described in the Methods section. It seems to follow Khosravi-Maharlooei et al. 2021. If the disease onset determination aligns with GVHD scoring, generally an indicator of T cell response, it is unsurprising that B cell contribution is negligible. The accelerated disease onset by B cell depletion likely results from lymphopenia-induced T cell activation. However, this result does not prove that these mice avoid organ-specific autoimmune diseases mediated by auto-antibodies and the current conclusion by the authors may overlook significant changes. For instance, would defining Disease Onset by the appearance of circulating autoantibodies alter the result of Disease-Free curve? Are there possibly histological findings at the endpoint of the experiment suggesting tissue damage by autoantibodies?

      (3) Helper functions, such as differentiating B cells into CXCR5+, were demonstrated for both Hu/Hu and Mu/Hu-derived T cells. This function seemed higher in Hu/Hu than in Mu/Hu. From the results in Figure 7-8, Hu/Hu Tph/Tfh cells have a stronger T cell identity and higher activation capacity in vivo on a per-cell basis than Mu/Hu's ones. However, Hu/Hu-T cells lacked an ability to induce class-switching in contrast to Mu/Hu's. The mechanisms causing these functional differences were not fully discussed. Discussions touching on possible changes in TCR repertoire diversity between Mu/Hu- and Hu/Hu- T cells would have been beneficial.

    1. eLife assessment

      This valuable and well-executed study describes how deletion of the autism spectrum disorder risk gene CNTNAP2 in mice increases dorsolateral striatal projection neuron excitability and promotes repetitive behaviors and cognitive inflexibility. The evidence supporting this claim is solid, although additional experimental evidence would strengthen claims of how corticostriatal activity is altered and linked to behavioral changes. The study provides a potential cellular explanation for the repetitive and inflexible behavior in Cntnap2 knockout mice and CNTNAP2 disorder in humans, which would interest both basic and translational neuroscientists.

    2. Reviewer #1 (Public Review):

      Summary:

      Cording et al. investigated how deletion of CNTNAP2, a gene associated with autism spectrum disorder, alters corticostriatal engagement and behavior. Specifically, the authors present slice electrophysiology data showing that striatal projection neurons (SPNs) are more readily driven to fire action potentials in response to stimulation of corticostriatal afferents, and this is due to increases in SPN intrinsic excitability rather than changes in excitatory or inhibitory synaptic inputs. The authors show that CNTNAP2 mice display repetitive behaviors, enhanced motor learning, and cognitive inflexibility. Overall the authors' conclusions are supported by their data, but a few claims could use some more evidence to be convincing.

      Strengths:

      The use of multiple behavioral techniques, both traditional and cutting-edge machine learning-based analyses, provides a powerful means of assessing repetitive behaviors and behavioral transitions/rigidity. Characterization of both excitatory and inhibitory synaptic responses in slice electrophysiology experiments offers a broad survey of the synaptic alterations that may lead to increased corticostriatal engagement of SPNs.

      Weaknesses:

      (1) The authors conclude that increased cortical engagement of SPNs is due to changes in SPN intrinsic excitability rather than synaptic strength (either excitatory or inhibitory). One weakness is that only AMPA receptor-mediated responses were measured. Though the holding potential used for experiments in Figure 1F-I wasn't clear, recordings were presumably performed at a hyperpolarized potential that limits NMDA receptor-mediated responses. Because the input-output experiments used to conclude that corticostriatal engagement of SPNs is elevated (Figure 1B-E) were conducted in the current clamp, it is possible that enhanced NMDA receptor engagement contributed to increased SPN responses to cortical stimulation. Confirming that NMDA receptor-mediated EPSC components are not altered would strengthen the main conclusion.

      (2) Data clearly show that SPN intrinsic excitability is increased in knockout mice. Given that CNTNAP2 has been linked to potassium channel regulation, it would be helpful to show and quantify additional related electrophysiology data such as negative IV curve responses and action potential hyperpolarization.

      (3) As it stands, the reported changes in dorsolateral striatum SPN excitability are only correlative with reported changes in repetitive behaviors, motor learning, and cognitive flexibility.

    3. Reviewer #2 (Public Review):

      Summary:

      This is an important study characterizing striatal dysfunction and behavioral deficits in Cntnap2-/- mice. There is growing evidence suggesting that striatal dysfunction underlies core symptoms of ASD but the specific cellular and circuit level abnormalities disrupted by different risk genes remain unclear. This study addresses how the deletion of Cntnap2 affects the intrinsic properties and synaptic connectivity of striatal spiny projection neurons (SPN) of the direct (dSPN) and indirect (iSPN) pathways. Using Thy1-ChR2 mice and optogenetics the authors found increased firing of both types of SPNs in response to cortical afferent stimulation. However, there was no significant difference in the amplitude of optically-evoked excitatory postsynaptic currents (EPSCs) or spine density between Cntnap2-/- and WT SPNs, suggesting that the increased corticostriatal coupling might be due to changes in intrinsic excitability. Indeed, the authors found Cntnap2-/- SPNs, particularly dSPNs, exhibited higher intrinsic excitability, reduced rheobase current, and increased membrane resistance compared to WT SPNs. The enhanced spiking probability in Cntnap2-/- SPNs is not due to reduced inhibition. Despite previous reports of decreased parvalbumin-expressing (PV) interneurons in various brain regions of Cntnap2-/- mice, the number and function (IPSC amplitude and intrinsic excitability) of these interneurons in the striatum were comparable to WT controls.

      This study also includes a comprehensive behavioral analysis of striatal-related behaviors. Cntnap2-/- mice demonstrated increased repetitive behaviors (RRBs), including more grooming bouts, increased marble burying, and increased nose poking in the holeboard assay. MoSeq analysis of behavior further showed signs of altered grooming behaviors and sequencing of behavioral syllables. Cntnap2-/- mice also displayed cognitive inflexibility in a four-choice odor-based reversal learning assay. While they performed similarly to WT controls during acquisition and recall phases, they required significantly more trials to learn a new odor-reward association during reversal, consistent with potential deficits in corticostriatal function.

      Strengths:

      This study provides significant contributions to the field. The finding of altered SPN excitability, the detailed characterization of striatal inhibition, and the comprehensive behavioral analysis are novel and valuable to understanding the pathophysiology of Cntnap2-/- mice.

      Weaknesses:

      (1) The approach based on Thy-ChR2 mice has the advantage of overcoming issues caused by injection efficiency and targeting variability. However, the spread of oEPSC amplitudes across mice shown in panels of Figure 1 G/I is very high with almost one order of magnitude difference between some mice. Given this is one of the most important points of the study it will be important to further analyze and discuss what this variability might be due to. Typically, in acute slice recordings, the within-animal variability is larger than the variability across animals. From the sample sizes reported it seems the authors sampled a large number of animals, but with a relatively low number of neurons per animal (per condition). Could this be one of the reasons for this variability?

      (2) This is particularly important because the analysis of corticostriatal evoked APs in panels C and E is performed on pooled data without considering the variability in evoked current amplitudes across animals shown in G and I. Were the neurons in panels C/E recorded from the same mice as shown in G/I? If so, it would be informative to regress AP firing data (say at 20% LED) to the average oEPSC amplitude recorded on those mice at the same light intensity. However, if the low number of neurons recorded per mouse is due to technical limitations, then increasing the sample size of these experiments would strengthen the study.

      (3) On a similar note, there is no discussion of why iSPNs also show increased corticostriatal evoked firing in Figure 1E, despite the difference in intrinsic excitability shown in Figure 3. This suggests other potential mechanisms that might underlie altered corticostriatal responses. Given the role of Caspr2 in clustering K channels in axons, altered presynaptic function or excitability could also contribute to this phenotype, but potential changes in PPR have not been explored in this study.

      (4) Male and female SPNs have different intrinsic properties but the number and/or balance of M/F mice used for each experiment is not reported.

      (5) There is no mention of how membrane resistance was calculated, and no I/V plots are shown.

      (6) It would be interesting to see which behavior transitions most contribute to the decrease in entropy. Are these caused by repeated or perseverative grooming bouts? Or is this inflexibility also observed across other behaviors? The transition map in Figure S5 shows the overall number of syllables and transitions but not their sequence during behavior. Can this be analyzed by calculating the ratio of individual 𝑢𝑖 × 𝑝𝑖,𝑗 × log2 𝑝𝑖,𝑗 factors across genotypes?

    4. Reviewer #3 (Public Review):

      Summary:

      The authors analyzed Cntnap2 KO mice to determine whether loss of the ASD risk gene CNTNAP2 alters the dorsal striatum's function.

      Strengths:

      The results demonstrate that loss of Cntnap2 results in increased excitability of striatal projection neurons (SPNs) and altered striatal-dependent behaviors, such as repetitive, inflexible behaviors. Unlike other brain areas and cell types, synaptic inputs onto SPNs were normal in Cntnap2 KO mice. The experiments are well-designed, and the results support the authors' conclusions.

      Weaknesses:

      The mechanism underlying SPN hyperexcitability was not explored, and it is unclear whether this cellular phenotype alone can account for the behavioral alterations in Cntnap2 KO mice. No clear explanation emerges for the variable phenotype in different brain areas and cell types.

    1. eLife assessment

      This study represents an important contribution to the study of decision-making under risk, bringing an interdisciplinary approach spanning economic theory, behavioral neuroscience, and computational modeling to test how choice preference is influenced by rare and extreme events. The authors present evidence that rats are indeed sensitive to these rare and extreme events despite their infrequent occurrence, driven primarily by an almost complete avoidance of "Black Swans" - rare and extreme losses. The evidence for specific sensitivity to rare and extreme events however remains incomplete, owing in part to the difficulty of isolating the effect of these events beyond that arising from risk preferences more generally in both task design and in the computational modeling of the choice behavior. Given the approach here brings a relatively novel perspective, with a more detailed treatment of these confounds this paper will be of broad interest to those seeking to understand animal behavior through the lens of economic choice.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the impact of rare and extreme events on rodents' decision-making under risk, in gain and loss contexts. They describe the behavior of 20 rats performing a four-armed bandit task, where probabilistic gains (sugar pellets) and losses (time-out punishments) can - in some arms - incorporate extremely large - but rare - outcomes. They report that most rats are sensitive to rare and extreme outcomes despite their infrequent occurrence, and that this sensitivity is primarily driven by extreme loss events which they try to avoid, rather than extreme gains that they seek to obtain.

      They finally propose a modification of standard reinforcement-learning, which features a specific sensitivity to rare and extreme outcomes and can account for the observed behavior.

      Strengths:

      The manuscript really taps into a surprisingly neglected but very relevant aspect of decision-making: the effect of rare and extreme events (REE). The authors have developed an experimental setup that seemingly allows investigation of this aspect, which is not trivial given the idiosyncratic properties of rare and extreme events.

      The parameters of the experimental setup seem also to be well thought off: basically, in the absence of REE, some options are objectively better than others (because, in expectation, they overall deliver more food, or minimize time-out punishments), but this ordering reverses if REE are taken into account. This allows for a clean test of the integration of REE in the rodent's decision-making model.

      The data is presented and analyzed in a very descriptive but exhaustive and transparent way, down to the description of individual rodent's behavior.

      Weaknesses:

      While the description and analyses of the behavioral patterns are rigorously done under the economic lens of risky decision-making, the authors' interpretation heavily relies on the assumption that rodents have built the correct model of the task during the training. Extensive details are provided about the training procedure, and the observed behavior at the end of the training, but it remains virtually impossible to disambiguate choices due to imperfect learning to choices made due to intrinsic preferences for risk or REE.

      By nature, gains (food pellets) and losses (time-out punishments) are somewhat incommensurable so the interpretation of the asymmetry due to outcome valence is also subject to interpretation. There might be some additional subtleties due e.g. satiety that could come from gaining REE (i.e. the delivery of 80 pellets from the Jackpot).

      In its current form, the paper is quite hard to digest. This is naturally the case with interdisciplinary work (here mixing economists and neurobiologists). But I am afraid that with the current frame, the paper is going to miss its target, in terms of audience.

      The proposed model seems somewhat disconnected from the behavioral patterns: while the model suggests an effect of REE at the decision stage (i.e. with specific decision weights for those rare events), this formalism seems at odds with the observation that REE (notably in the loss domain) has an impact of subsequent behavior - (Black Swans tend to reinforce Total Sensitivity to REE) which rather suggests an effect at the learning stage.

      Discussion:

      This study convincingly demonstrates that REEs are processed rather uniquely, which makes sense given their evolutionary relevance. REE has indeed been somewhat neglected in previous research, and this study therefore opens an interesting new front on the fundamental aspects of decision under risk. The authors have devised an original theoretical and empirical framework that will be useful for the community, and the combination of economics analysis and rodent behavior constitutes a thought-provoking ground to think about the nature of risk preferences. The interpretation and mechanistic account of these aspects, as well as their generalizability outside the specific context of this study, remain to be strengthened.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcomes in the same conditions.

      Strengths:

      (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.

      (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.

      (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.

      (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.

      Weaknesses:

      (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance. The design does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect.

      (2) The RL-modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). This parameter only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. In the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one.

      (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That made it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses and gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumptions unnecessarily into the description of the experimental design. In terms of results, "Black Swan" avoidance is more simply known as risk aversion for losses.

      (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible that an animal never experienced the rare, extreme event for a specific option. It's even possible (if they only picked it on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This cannot be known given the information provided. The Supplemental info on p.55 only gives gross overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association?

      (5) The choice data are only presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly, even if it can theoretically be inferred from the sum and the difference. To understand what the rats actually do, we first need to see how often they select each option, without these transformations.

      (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to be pre-registered in any way, leaving many researchers with degrees of freedom. Were any alternative analysis pipelines attempted? Similarly, there were many sub-groupings of the animals, and then comparisons between them - were these post-hoc?

      (7) On p. 17, there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example). What this means is that there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Using a 10-trial window would thus obscure any impact of this rare, extreme event. An analysis that looks at a time course trial-by-trial could reveal any impact.

      (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses.

      (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model? Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.

      (10) The paper does not sufficiently accurately represent the existing literature on human risky decision-making (with and without rare events). Here are a few examples of misrepresented and/or missing literature:<br /> -Most studies on decision-making do not only rely on p > 10% (as per p. 2). Maybe that is true with animals, but not a fair statement generally. Some do, and some don't. There is substantial literature looking at rarer events in both descriptions (most famously with Kahneman & Tversky's work), but also in experience (which is alluded to in reference 19). That reference is not only about the situation when choices are not repeated (e.g. the sampling paradigm), but also partial feedback and full-feedback situations.

      The literature on learning from rewarding experiences in humans is obliquely referenced but not really incorporated. In short, there are two main findings - firstly people underweight rare events in experience; second, people overweight extreme outcomes in experience (both contrary to description). Some related papers are cited, but their content is not used or incorporated into the logic of the manuscript.

      One recent study systematically examined rarity and extremity in human risky decision-making, which seems very relevant here: Mason et al. (2024). Rare and extreme outcomes in risky choice. Psychonomic Bulletin & Review, 31, 1301-1308.

      There is a fair bit of research on the human perception of the risk of rare events (including from experience) and important events like climate. One notable paper is Newell et al (2015) in Nature Climate Change.

    1. eLife assessment

      Lloyd et al. used an evolutionary comparative approach to study DNA damage repair in response to sleep deprivation in Astyanax mexicanus, highlighting how the cavefish population has evolved a reduced DNA damage response compared to the surface-dwelling population. The cavefish have elevated expression of signals commonly associated with aging but do not show evidence of reduced life span nor increased aged-linked pathology, a potentially valuable finding for the field of aging research. A link to alterations in sleep behaviour is outlined, but the evidence for such a link is incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      Lloyd et al employ an evolutionary comparative approach to study how sleep deprivation affects DNA damage repair in Astyanax mexicanus, using the cave vs surface species evolution as a playground. The work shows, convincingly, that the cavefish population has evolved an impaired DNA damage response both following sleep deprivation or a classical paradigm of DNA damage (UV).

      Strengths:

      The study employs a thorough multidisciplinary approach. The experiments are well conducted and generally well presented.

      Weaknesses:

      Having a second experimental mean to induce DNA damage would strengthen and generalise the findings.

      Overall, the study represents a very important addition to the field. The model employed underlines once more the importance of using an evolutionary approach to study sleep and provides context and caveats to statements that perhaps were taken a bit too much for granted before. At the same time, the paper manages to have an extremely constructive approach, presenting the platform as a clear useful tool to explore the molecular aspects behind sleep and cellular damage in general. The discussion is fair, highlighting the strengths and weaknesses of the work and its implications.

    3. Reviewer #2 (Public Review):

      The manuscript investigates the relationship between sleep, DNA damage, and aging in the Mexican cavefish (Astyanax mexicanus), a species that exhibits significant differences in sleep patterns between surface-dwelling and cave-dwelling populations. The authors aim to understand whether these evolved sleep differences influence the DNA damage response (DDR) and oxidative stress levels in the brain and gut of the fish.

      Summary of the Study:

      The primary objective of the study is to determine if the reduced sleep observed in cave-dwelling populations is associated with increased DNA damage and altered DDR. The authors compared levels of DNA damage markers and oxidative stress in the brains and guts of surface and cavefish. They also analyzed the transcriptional response to UV-induced DNA damage and evaluated the DDR in embryonic fibroblast cell lines derived from both populations.

      Strengths of the Study:

      Comparative Approach:<br /> The study leverages the unique evolutionary divergence between surface and cave populations of A. mexicanus to explore fundamental biological questions about sleep and DNA repair.

      Multifaceted Methodology:<br /> The authors employ a variety of methods, including immunohistochemistry, RNA sequencing, and in vitro cell line experiments, providing a comprehensive examination of DDR and oxidative stress.

      Interesting Findings:

      The study presents intriguing results showing elevated DNA damage markers in cavefish brains and increased oxidative stress in cavefish guts, alongside a reduced transcriptional response to UV-induced DNA damage.

      Weaknesses of the Study:

      Link to Sleep Physiology:<br /> The evidence connecting the observed differences in DNA damage and DDR directly to sleep physiology is not convincingly established. While the study shows distinct DDR patterns, it does not robustly demonstrate that these are a direct result of sleep differences.

      Causal Directionality:<br /> The study fails to establish a clear causal relationship between sleep and DNA damage. It is possible that both sleep patterns and DDR responses are downstream effects of a common cause or independent adaptations to the cave environment.

      Environmental Considerations:<br /> The lab conditions may not fully replicate the natural environments of the cavefish, potentially influencing the results. The impact of these conditions on the study's findings needs further consideration.

      Photoreactivity in Albino Fish:<br /> The use of UV-induced DNA damage as a primary stressor may not be entirely appropriate for albino, blind cavefish. Alternative sources of genotoxic stress should be explored to validate the findings.

      Assessment of the Study's Achievements:

      The authors partially achieve their aims by demonstrating differences in DNA damage and DDR between surface and cavefish. However, the results do not conclusively support the claim that these differences are driven by or directly related to the evolved sleep patterns in cavefish. The study's primary claims are only partially supported by the data.

      Impact and Utility:

      The findings contribute valuable insights into the relationship between sleep and DNA repair mechanisms, highlighting potential areas of resilience to DNA damage in cavefish. While the direct link to sleep physiology remains unsubstantiated, the study's data and methods will be useful to researchers investigating evolutionary biology, stress resilience, and the molecular basis of sleep.

    4. Reviewer #3 (Public Review):

      Lloyd, Xia, et al. utilised the existence of surface-dwelling and cave-dwelling morphs of Astyanax mexicanus to explore a proposed link between DNA damage, aging, and the evolution of sleep. Key to this exploration is the behavioural and physiological differences between cavefish and surface fish, with cavefish having been previously shown to have low levels of sleep behaviour, along with metabolic alterations (for example chronically elevated blood glucose levels) in comparison to fish from surface populations. Sleep deprivation, metabolic dysfunction, and DNA damage are thought to be linked and to contribute to aging processes. Given that cavefish seem to show no apparent health consequences of low sleep levels, the authors suggest that they have evolved resilience to sleep loss. Furthermore, as extended wake and loss of sleep are associated with increased rates of damage to DNA (mainly double-strand breaks) and sleep is linked to repair of damaged DNA, the authors propose that changes in DNA damage and repair might underlie the reduced need for sleep in the cavefish morphs relative to their surface-dwelling conspecifics.

      To fulfill their aim of exploring links between DNA damage, aging, and the evolution of sleep, the authors employ methods that are largely appropriate, and comparison of cavefish and surface fish morphs from the same species certainly provides a lens by which cellular, physiological and behavioural adaptations can be interrogated. Fluorescence and immunofluorescence are used to measure gut reactive oxygen species and markers of DNA damage and repair processes in the different fish morphs, and measurements of gene expression and protein levels are appropriately used. However, although the sleep tracking and quantification employed are quite well established, issues with the experimental design relate to attempts to link induced DNA damage to sleep regulation (outlined below). Moreover, although the methods used are appropriate for the study of the questions at hand, there are issues with the interpretation of the data and with these results being over-interpreted as evidence to support the paper's conclusions.

      This study shows that a marker of DNA repair molecular machinery that is recruited to DNA double-strand breaks (γH2AX) is elevated in brain cells of the cavefish relative to the surface fish and that reactive oxygen species are higher in most areas of the digestive tract of the cavefish than in that of the surface fish. As sleep deprivation has been previously linked to increases in both these parameters in other organisms (both vertebrates and invertebrates), their elevation in the cavefish morph is taken to indicate that the cavefish show signs of the physiological effects of chronic sleep deprivation.

      It has been suggested that induction of DNA damage can directly drive sleep behaviour, with a notable study describing both the induction of DNA damage and an increase in sleep/immobility in zebrafish (Danio rerio) larvae by exposure to UV radiation (Zada et al. 2021 doi:10.1016/j.molcel.2021.10.026). In the present study, an increase in sleep/immobility is induced in surface fish larvae by exposure to UV light, but there is no effect on behaviour in cavefish larvae. This finding is interpreted as representing a loss of a sleep-promoting response to DNA damage in the cavefish morph. However, induction of DNA damage is not measured in this experiment, so it is not certain if similar levels of DNA damage are induced in each group of intact larvae, nor how the amount of damage induced compares to the pre-existing levels of DNA damage in the cavefish versus the surface fish larvae. In both this study with A. mexicanus surface morphs and the previous experiments from Zada et al. in zebrafish, observed increases in immobility following UV radiation exposure are interpreted as following from UV-induced DNA damage. However, in interpreting these experiments it is important to note that the cavefish morphs are eyeless and blind. Intense UV radiation is aversive to fish, and it has previously been shown in zebrafish larvae that (at least some) behavioural responses to UV exposure depend on the presence of an intact retina and UV-sensitive cone photoreceptors (Guggiana-Nilo and Engert, 2016, doi:10.3389/fnbeh.2016.00160). It is premature to conclude that the lack of behavioural response to UV exposure in the cavefish is due to a different response to DNA damage, as their lack of eyes will likely inhibit a response to the UV stimulus. Indeed, were the equivalent zebrafish experiment from Zada et al. to be repeated with mutant larvae fish lacking the retinal basis for UV detection it might be found that in this case too, the effects of UV on behaviour are dependent on visual function. Such a finding should prompt a reappraisal of the interpretation that UV exposure's effects on fish sleep/locomotor behaviour are mediated by DNA damage. An additional note, relating to both Lloyd, Xia, et al., and Zada et al., is that though increases in immobility are induced following UV exposure, in neither study have assays of sensory responsiveness been performed during this period. As a decrease in sensory responsiveness is a key behavioural criterion for defining sleep, it is, therefore, unclear that this post-UV behaviour is genuinely increased sleep as opposed to a stress-linked suppression of locomotion due to the intensely aversive UV stimulus.

      The effects of UV exposure, in terms of causing damage to DNA, inducing DNA damage response and repair mechanisms, and in causing broader changes in gene expression are assessed in both surface and cavefish larvae, as well as in cell lines derived from these different morphs. Differences in the suite of DNA damage response mechanisms that are upregulated are shown to exist between surface fish and cavefish larvae, though at least some of this difference is likely to be due to differences in gene expression that may exist even without UV exposure (this is discussed further below).

      UV exposure induced DNA damage (as measured by levels of cyclobutene pyrimidine dimers) to a similar degree in cell lines derived from both surface fish and cave fish. However, γH2AX shows increased expression only in cells from the surface fish, suggesting induction of an increased DNA repair response in these surface morphs, corroborated by their cells' increased ability to repair damaged DNA constructs experimentally introduced to the cells in a subsequent experiment. This "host cell reactivation assay" is a very interesting assay for measuring DNA repair in cell lines, but the power of this approach might be enhanced by introducing these DNA constructs into larval neurons in vivo (perhaps by electroporation) and by tracking DNA repair in living animals. Indeed, in such a preparation, the relationship between DNA repair and sleep/wake state could be assayed.

      Comparing gene expression in tissues from young (here 1 year) and older (here 7-8 years) fish from both cavefish and surface fish morphs, the authors found that there are significant differences in the transcriptional profiles in brain and gut between young and old surface fish, but that for cavefish being 1 year old versus being 7-8 years old did not have a major effect on transcriptional profile. The authors take this as suggesting that there is a reduced transcriptional change occurring during aging and that the transcriptome of the cavefish is resistant to age-linked changes. This seems to be only one of the equally plausible interpretations of the results; it could also be the case that alterations in metabolic cellular and molecular mechanisms, and particularly in responses to DNA damage, in the cavefish mean that these fish adopt their "aged" transcriptome within the first year of life.

      A major weakness of the study in its current form is the absence of sleep deprivation experiments to assay the effects of sleep loss on the cellular and molecular parameters in question. Without such experiments, the supposed link of sleep to the molecular, cellular, and "aging" phenotypes remains tenuous. Although the argument might be made that the cavefish represent a naturally "sleep-deprived" population, the cavefish in this study are not sleep-deprived, rather they are adapted to a condition of reduced sleep relative to fish from surface populations. Comparing the effects of depriving fish from each morph on markers of DNA damage and repair, gut reactive oxygen species, and gene expression will be necessary to solidify any proposed link of these phenotypes to sleep.

      A second important aspect that limits the interpretability and impact of this study is the absence of information about circadian variations in the parameters measured. A relationship between circadian phase, light exposure, and DNA damage/repair mechanisms is known to exist in A. mexicanus and other teleosts, and differences exist between the cave and surface morphs in their phenomena (Beale et al. 2013, doi: 10.1038/ncomms3769). Although the present study mentions that their experiments do not align with these previous findings, they do not perform the appropriate experiments to determine if such a misalignment is genuine. Specifically, Beale et al. 2013 showed that white light exposure drove enhanced expression of DNA repair genes (including cpdp which is prominent in the current study) in both surface fish and cavefish morphs, but that the magnitude of this change was less in the cave fish because they maintained an elevated expression of these genes in the dark, whereas the darkness suppressed the expression of these genes in the surface fish. If such a phenomenon is present in the setting of the current study, this would likely be a significant confound for the UV-induced gene expression experiments in intact larvae, and undermine the interpretation of the results derived from these experiments: as samples are collected 90 minutes after the dark-light transition (ZT 1.5) it would be expected that both cavefish and surface fish larvae should have a clear induction of DNA repair genes (including cpdp) regardless of 90s of UV exposure. The data in Supplementary Figure 3 is not sufficient to discount this potentially serious confound, as for larvae there is only gene expression data for time points from ZT2 to ZT 14, with all of these time points being in the light phase and not capturing any dynamics that would occur at the most important timepoints from ZT0-ZT1.5, in the relevant period after dark-light transition. Indeed, an appropriate control for this experiment would involve frequent sampling at least across 48 hours to assess light-linked and developmentally-related changes in gene expression that would occur in 5-6dpf larvae of each morph independently of the exposure to UV.

      On a broader point, given the effects of both circadian rhythm and lighting conditions that are thought to exist in A. mexicanus (e.g. Beale et al. 2013) experiments involving measurements of DNA damage and repair, gene expression, and reactive oxygen species, etc. at multiple times across >1 24 hour cycle, in both light-dark and constant illumination conditions (e.g. constant dark) would be needed to substantiate the authors' interpretation that their findings indicate consistently altered levels of these parameters in the cavefish relative to the surface fish. Most of the data in this study is taken at only single time points.

      In summary, the authors show that there are differences in gene expression, activity of DNA damage response and repair pathways, response to UV radiation, and gut reactive oxygen species between the Pachón cavefish morph and the surface morph of Astyanax mexicanus. However, the data presented does not make the precise nature of these differences very clear, and the interpretation of the results appears to be overly strong. Furthermore, the evidence of a link between these morph-specific differences and sleep is unconvincing.

    1. eLife assessment

      This is an important characterization of mouse auditory cortex receptive field organization, using two-photon imaging of specific subpopulations. They demonstrate a degradation of tonotopic organization from the input to output neurons. The strength of the evidence is solid, but some controls are needed to further strengthen the conclusion.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Gu et al. employed novel viral strategies, combined with in vivo two-photon imaging, to map the tone response properties of two groups of cortical neurons in A1. The thalamocortical recipient (TR neurons) and the corticothalamic (CT neurons). They observed a clear tonotopic gradient among TR neurons but not in CT neurons. Moreover, CT neurons exhibited high heterogeneity of their frequency tuning and broader bandwidth, suggesting increased synaptic integration in these neurons. By parsing out different projecting-specific neurons within A1, this study provides insight into how neurons with different connectivity can exhibit different frequency response-related topographic organization.

      Strengths:

      This study reveals the importance of studying neurons with projection specificity rather than layer specificity since neurons within the same layer have very diverse molecular, morphological, physiological, and connectional features. By utilizing a newly developed rabies virus CSN-N2c GCaMP-expressing vector, the authors can label and image specifically the neurons (CT neurons) in A1 that project to the MGB. To compare, they used an anterograde trans-synaptic tracing strategy to label and image neurons in A1 that receive input from MGB (TR neurons).

      Weaknesses:

      - Perhaps as cited in the introduction, it is well known that tonotopic gradient is well preserved across all layers within A1, but I feel if the authors want to highlight the specificity of their virus tracing strategy and the populations that they imaged in L2/3 (TR neurons) and L6 (CT neurons), they should perform control groups where they image general excitatory neurons in the two depths and compare to TR and CT neurons, respectively. This will show that it's not their imaging/analysis or behavioral paradigms that are different from other labs.  

      - Figures 1D and G, the y-axis is Distance from pia (%). I'm not exactly sure what this means. How does % translate to real cortical thickness? 

      - For Figure 2G and H, is each circle a neuron or an animal? Why are they staggered on top of each other on the x-axis? If the x-axis is the distance from caudal to rostral, each neuron should have a different distance? Also, it seems like it's because Figure 2H has more circles, which is why it has more variation, thus not significant (for example, at 600 or 900um, 2G seems to have fewer circles than 2H).  

      - Similarly, in Figures 2J and L, why are the circles staggered on the y-axis now? And is each circle now a neuron or a trial? It seems they have many more circles than Figure 2G and 2H. Also, I don't think doing a correlation is the proper stats for this type of plot (this point applies to Figures 3H and 3J).

      - What does the inter-quartile range of BF (IQRBF, in octaves) imply? What's the interpretation of this analysis? I am confused as to why TR neurons show high IQR in HF areas compared to LF areas, which means homogeneity among TR neurons (lines 213 - 216). On the same note, how is this different from the BF variability?  Isn't higher IQR equal to higher variability?

      - Figure 4A-B, there are no clear criteria on how the authors categorize V, I, and O shapes. The descriptions in the Methods (lines 721 - 725) are also very vague.

    3. Reviewer #2 (Public Review):

      Summary:

      Gu and Liang et. al investigated how auditory information is mapped and transformed as it enters and exits an auditory cortex. They use anterograde transsynaptic tracers to label and perform calcium imaging of thalamorecipient neurons in A1 and retrograde tracers to label and perform calcium imaging of corticothalamic output neurons. They demonstrate a degradation of tonotopic organization from the input to output neurons.

      Strengths:

      The experiments appear well executed, well described, and analyzed.

      Weaknesses:

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether or not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording (1) CT neurons in upper layers, (2) TR in deeper layers, (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      (2) What percent of the neurons at the depths are CT neurons? Similar questions for TR neurons?

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      (4) Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, and V1.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high-frequency regions of the primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in the cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, the tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and the best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and reported that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      (3) The authors' definitions of neuronal response type in the methods need more quantitative detail. The authors state: ""Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels.". The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

    1. eLife assessment

      This study resolves a cryo-EM structure of the GPCR, human GPR30, which responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. Understanding the ligand and the mechanism of activation is important to the field of receptor signaling and potentially facilitates drug development targeting this receptor. While the overall structures are solid, the identification of the bicarbonate binding site is only partly supported by the structural data and cell-based functional assays, leaving a major aim of the study incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      This study resolves a cryo-EM structure of the GPCR, GPR30, which was recently identified as a bicarbonate receptor by the authors' lab. Understanding the ligand and the mechanism of activation is of fundamental importance to the field of receptor signaling. However, the main claim of the paper, the identification of the bicarbonate binding site, is only partly supported by the structural and functional data, leaving the study incomplete.

      Strengths:

      The overall structure, and proposed mechanism of G-protein coupling seem solid. The authors perform fairly extensive unbiased mutagenesis to identify a host of positions that are important to G-protein signaling. To my knowledge, bicarbonate is the only physiological ligand that has been identified for GPR30, making this study a particularly important contribution to the field.

      Weaknesses:

      Without higher resolution structures and/or additional experimental assessment of the binding pocket, the assignment of the bicarbonate remains highly speculative. The local resolution is especially poor in the ECL loop region where the ligand is proposed to bind (4.3 - 4 .8 Å range). Of course, sometimes it is difficult to achieve high structural resolution, but in these cases, the assignment of ligands should be backed up by even more rigorous experimental validation.

      The functional assay monitors activation of GPR30, and thus reports on not only bicarbonate binding, but also the integrity of the allosteric network that transduces the binding signal across the membrane. Thus, disruption of bicarbonate signaling by mutagenesis of the putative coordinating residues does not necessarily mean that bicarbonate binding has been disrupted. Moreover, the mutagenesis was apparently done prior to structure determination, meaning that residues proposed to directly surround bicarbonate binding, such as E218, were not experimentally validated. Targeted mutagenesis based on the structure would strengthen the story.

      Moreover, the proposed bicarbonate binding site is surprising in a chemical sense, as it is located within an acidic pocket. The authors cite several other structural studies to support the surprising observation of anionic bicarbonate surrounded by glutamate residues in an acidic pocket (references 31-34). However, it should be noted that in general, these other structures also possess a metal ion (sodium or calcium) and/or a basic sidechain (arginine or lysine) in the coordination sphere, forming a tight ion pair. Thus, the assigned bicarbonate binding site in GPR30 remains an anomaly in terms of the chemical properties of the proposed binding site.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, "Cryo-EM structure of the bicarbonate receptor GPR30," the authors aimed to enrich our understanding of the role of GPR30 in pH homeostasis by combining structural analysis with a receptor function assay. This work is a natural development and extension of their previous work (PMID: 38413581). In the current body of work, they solved the first cryo-EM structure of the human GPR30-G-protein (mini-Gsqi) complex in the presence of bicarbonate ions at 3.21 Å resolution. From the atomic model built based on this map, they observed the overall canonical architecture of class A GPCR and also identified 4 extracellular pockets created by extracellular loops (ECLs) (Pockets A-D). Based on the polarity, location, and charge of each pocket, the authors hypothesized that pocket D is a good candidate for the bicarbonate binding site. To verify their structural observation, on top of the 10 mutations they generated in the previous work, the authors introduced another 11 mutations to map out the essential residues for the bicarbonate response on hGPR30. In addition, the human GPR30-G-protein complex model also allowed the authors to untangle the G-protein coupling mechanism of this special class A GPCR that plays an important role in pH homeostasis.

      Strengths:

      As a continuation of their recent Nature Communication publication (PMID: 38413581), this study was carefully designed, and the authors used mutagenesis and functional studies to confirm their structural observations. This work provided high-resolution structural observations for the receptor in complex with G-protein, allowing us to explore its mechanism of action, and will further facilitate drug development targeting GPR30. There were 4 extracellular pockets created by ECLs (Pockets A-D). The authors were able to filter out 3 of them and identified that pocket D was a good candidate for the bicarbonate binding site based on the polarity, location, and charge of each pocket. From there, the authors identified the key residues on GPR30 for its interaction with the substrate, bicarbonate. Together with their previous work, they carefully mapped out nine amino acids that are critical for receptor reactivity.

      Weaknesses:

      It is unclear how novel the aspects presented in the new paper are compared to the most recent Nature Communications publication (PMID: 38413581). Some areas of the manuscript appear to be mixed with the previous publication. The work is still impactful to the field. The new and novel aspects of this manuscript could be better highlighted.

      I also have some concerns about the TGFα shedding assay the authors used to verify their structural observation. I understand that this assay was also used in the authors' previous work published in Nature Communications. However, there are still several things in the current data that raised concerns:

      (1) The authors confirmed the "similar expression levels of HA-tagged hGPR30" mutants by WB in Supplemental Figure 1A and B. However, compared to the hGPR30-HA (~6.5 when normalized to the housekeeping gene, Na-K-ATPase), several mutants of the key amino acids had much lower surface expression: S134A, D210A, C207A had ~50% reduction, D125A had ~30% reduction, and Q215A and P71A had ~20% reduction. This weakens the receptor reactivity measured by the TGFα shedding assay.

      (2) In the previous work, the authors demonstrated that hGPR30 signals through the Gq signaling pathway and can trigger calcium mobilization. Given that calcium mobilization is a more direct measurement for the downstream signaling of hGPR30 than the TGFα shedding assay, pairing the mutagenesis study with the calcium assay will be a better functional validation to confirm the disruption of bicarbonate signaling.

      (3) It was quite confusing for Figure 4B that all statistical analyses were done by comparing to the mock group. It would be clearer to compare the activity of the mutants to the wild-type cell line.

      Additional concerns about the structural data include:

      (1) E218 was in close contact with bicarbonate in Figure 4D. However, there is no functional validation for this observation. Including the mutagenesis study of this site in the cell-based functional assay will strengthen this structural observation.

      (2) For the flow chart of the cryo-EM data processing in Supplemental data 2, the authors started with 10,148,422 particles after template picking, then had 441,348 Particles left after 2D classification/heterogenous refinement, and finally ended with 148,600 particles for the local refinement for the final map. There seems to be a lot of heterogeneity in this purified sample. GPCRs usually have flexible and dynamic loop regions, which explains the poor resolution of the ECLs in this case. Thus, a solid cell-based functional validation is a must to assign the bicarbonate binding pocket to support their hypothesis.

    4. Reviewer #3 (Public Review):

      Summary:

      GPR30 responds to bicarbonate and regulates cellular responses to pH and ion homeostasis. However, it remains unclear how GPR30 recognizes bicarbonate ions. This paper presents the cryo-EM structure of GPR30 bound to a chimeric mini-Gq in the presence of bicarbonate. The structure together with functional studies aims to provide mechanistic insights into bicarbonate recognition and G protein coupling.

      Strengths:

      The authors performed comprehensive mutagenesis studies to map the possible binding site of bicarbonate.

      Weaknesses:

      Owing to the poor resolution of the structure, some structural findings may be overclaimed.

      Based on EM maps shown in Figure 1a and Figure Supplement 2, densities for side chains in the receptor particularly in ECLs (around 4 Å) are poorly defined. At this resolution, it is unlikely to observe a disulfide bond (C130ECL1-C207ECl2) and bicarbonate ions. Moreover, the disulfide between ECL1 and ECL2 has not been observed in other GPCRs and the published structure of GPR30 (PMID: 38744981). The density of this disulfide bond could be noise.

      The authors observed a weak density in pocket D, which is accounted for by the bicarbonate ions. This ion is mainly coordinated by Q215 and Q138. However, the Q215A mutation only reduced but not completely abolished bicarbonate response, and the author did not present the data of Q138A mutation. Therefore, Q215 and Q138 could not be bicarbonate binding sites. While H307A completely abolished bicarbonate response, the authors proposed that this residue plays a structural role. Nevertheless, based on the structure, H307 is exposed and may be involved in binding bicarbonate. The assignment of bicarbonate in the structure is not supported by the data.

    1. eLife assessment

      This convincing study advances our understanding of the physiological consequences of the strong overexpression of non-toxic proteins in baker's yeast. The findings suggest that a massive protein burden results in nitrogen starvation and a shift in metabolism likely regulated via the TORC1 pathway, as well as defects in ribosome biogenesis in the nucleolus. The study presents findings and tools that are important for the cell biology and protein homeostasis fields.

    2. Reviewer #1 (Public Review):

      Summary:

      The study "Impact of Maximal Overexpression of a Non-toxic Protein on Yeast Cell Physiology" by Fujita et al. aims to elucidate the physiological impacts of overexpressing non-toxic proteins in yeast cells. By identifying model proteins with minimal cytotoxicity, the authors claim to provide insights into cellular stress responses and metabolic shifts induced by protein overexpression.

      Strengths:

      The study introduces a neutrality index to quantify cytotoxicity and investigates the effects of protein burden on yeast cell physiology. The study identifies mox-YG (a non-fluorescent fluorescent protein) and Gpm1-CCmut (an inactive glycolytic enzyme) as proteins with the lowest cytotoxicity, capable of being overexpressed to more than 40% of total cellular protein while maintaining yeast growth. Overexpression of mox-YG leads to a state resembling nitrogen starvation probably due to TORC1 inactivation, increased mitochondrial function, and decreased ribosomal abundance, indicating a metabolic shift towards more energy-efficient respiration and defects in nucleolar formation.

      Weaknesses:

      While the introduction of the neutrality index seems useful to differentiate between cytotoxicity and protein burden, the biological relevance of the effects of overexpression of the model proteins is unclear.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Fujita et al. characterized the neutrality indexes of several protein mutants in S. cerevisiae and uncovered that mox-YG and Gpm1-CCmut can be expressed as abundant as 40% of total proteins without causing severe growth defects. The authors then looked at the transcriptome and proteome of cells expressing excess mox-YG to investigate how protein burden affects yeast cells. Based on RNA-seq and mass-spectrometry results, the authors uncover that cells with excess mox-YG exhibit nitrogen starvation, respiration increase, inactivated TORC1 response, and decreased ribosomal abundance. The authors further showed that the decreased ribosomal amount is likely due to nucleoli defects, which can be partially rescued by nuclear exosome mutations.

      Strengths:

      Overall, this is a well-written manuscript that provides many valuable resources for the field, including the neutrality analysis on various fluorescent proteins and glycolytic enzymes, as well as the RNA-seq and proteomics results of cells overexpressing mox-YG. Their model on how mox-YG overexpression impairs the nucleolus and thus leads to ribosomal abundance decline will also raise many interesting questions for the field.

      Weaknesses:

      The authors concluded from their RNA-seq and proteomics results that cells with excess mox-YG expression showed increased respiration and TORC1 inactivation. I think it will be more convincing if the authors can show some characterization of mitochondrial respiration/membrane potential and the TOR responses to further verify their -omic results.

      In addition, the authors only investigated how overexpression of mox-YG affects cells. It would be interesting to see whether overexpressing other non-toxic proteins causes similar effects, or if there are protein-specific effects. It would be good if the authors could at least discuss this point considering the workload of doing another RNA-seq or mass-spectrum analysis might be too heavy.

    4. Reviewer #3 (Public Review):

      Summary:

      Protein overexpression is widely used in experimental systems to study the function of the protein, assess its (beneficial or detrimental) effects in disease models, or challenge cellular systems involved in synthesis, folding, transport, or degradation of proteins in general. Especially at very high expression levels, protein-specific effects and general effects of a high protein load can be hard to distinguish. To overcome this issue, Fujita et al. use the previously established genetic tug-of-war system to identify proteins that can be expressed at extremely high levels in yeast cells with minimal protein-specific cytotoxicity (high 'neutrality'). They focus on two versions of the protein mox-GFP, the fluorescent version and a point mutation that is non-fluorescent (mox-YG) and is the most 'neutral' protein on their screen. They find that massive protein expression (up to 40% of the total proteome) results in a nitrogen starvation phenotype, likely inactivation of the TORC1 pathway, and defects in ribosome biogenesis in the nucleolus.

      Strengths:

      This work uses an elegant approach and succeeds in identifying proteins that can be expressed at surprisingly high levels with little cytotoxicity. Many of the changes they see have been observed before under protein burden conditions, but some are new and interesting. This work solidifies previous hypotheses about the general effects of protein overexpression and provides a set of interesting observations about the toxicity of fluorescent proteins (that is alleviated by mutations that render them non-fluorescent) and metabolic enzymes (that are less toxic when mutated into inactive versions).

      Weaknesses:

      The data are generally convincing, however in order to back up the major claim of this work - that the observed changes are due to general protein burden and not to the specific protein or condition - a broader analysis of different conditions would be highly beneficial.

      Major points:

      (1) The authors identify several proteins with high neutrality scores but only analyze the effects of mox/mox-YG overexpression in depth. Hence, it remains unclear which molecular phenotypes they observe are general effects of protein burden or more specific effects of these specific proteins. To address this point, a proteome (and/or transcriptome) of at least a Gpm1-CCmut expressing strain should be obtained and compared to the mox-YG proteome. Ideally, this analysis should be done simultaneously on all strains to achieve a good comparability of samples, e.g. using TMT multiplexing (for a proteome) or multiplexed sequencing (for a transcriptome). If feasible, the more strains that can be included in this comparison, the more powerful this analysis will be and can be prioritized over depth of sequencing/proteome coverage.

      (2) The genetic tug-of-war system is elegant but comes at the cost of requiring specific media conditions (synthetic minimal media lacking uracil and leucine), which could be a potential confound, given that metabolic rewiring, and especially nitrogen starvation are among the observed phenotypes. I wonder if some of the changes might be specific to these conditions. The authors should corroborate their findings under different conditions. Ideally, this would be done using an orthogonal expression system that does not rely on auxotrophy (e.g. using antibiotic resistance instead) and can be used in rich, complex mediums like YPD. Minimally, using different conditions (media with excess or more limited nitrogen source, amino acids, different carbon source, etc.) would be useful to test the robustness of the findings towards changes in media composition.

      (3) The authors suggest that the TORC1 pathway is involved in regulating some of the changes they observed. This is likely true, but it would be great if the hypothesis could be directly tested using an established TORC1 assay.

      (4) The finding that the nucleolus appears to be virtually missing in mox-YG-expressing cells (Figure 6B) is surprising and interesting. The authors suggest possible mechanisms to explain this and partially rescue the phenotype by a reduction-of-function mutation in an exosome subunit. I wonder if this is specific to the mox-YG protein or a general protein burden effect, which the experiments suggested in point 1 should address. Additionally, could a mox-YG variant with a nuclear export signal be expressed that stays exclusively in the cytosol to rule out that mox-YG itself interferes with phase separation in the nucleus?

      Minor points:

      (5) It would be great if the authors could directly compare the changes they observed at the transcriptome and proteome levels. This can help distinguish between changes that are transcriptionally regulated versus more downstream processes (like protein degradation, as proposed for ribosome components).

    1. eLife assessment

      This paper reports important findings on giant organelle complexes containing endosomes and lysosomes (termed endosomal-lysosomal organelles form assembly structures [ELYSAs]) present in mouse oocytes and 1- to 2-cell embryos. The data showing the localization and dynamics of ELYSAs during oocyte/embryo maturation are convincing. This work will be of interest to general cell biologists and developmental biologists.

    2. Reviewer #1 (Public Review):

      In this manuscript, Satouh et al. report giant organelle complexes in oocytes and early embryos. Although these structures have often been observed in oocytes and early embryos, their exact nature has not been characterized. The authors named these structures "endosomal-lysosomal organelles form assembly structures (ELYSAs)". ELYSAs contain organelles such as endosomes, lysosomes, and probably autophagic structures. ELYSAs are initially formed in the perinuclear region and then migrate to the periphery in an actin-dependent manner. When ELYSAs are disassembled after the 2-cell stage, the V-ATPase V1 subunit is recruited to make lysosomes more acidic and active. The ELYSAs are most likely the same as the "endolysosomal vesicular assemblies (ELVAs)", reported by Elvan Böke's group earlier this year (Zaffagnini et al. doi.org/10.1016/j.cell.2024.01.031). However, it is clear that Satouh et al. identified and characterized these structures independently. These two studies could be complementary. Although the nature of the present study is generally descriptive, this paper provides valuable information about these giant structures. The data are mostly convincing, and only some minor modifications are needed for clarification and further explanation to fully understand the results.

    3. Reviewer #2 (Public Review):

      Satouh et al report the presence of spherical structures composed of endosomes, lysosomes, and autophagosomes within immature mouse oocytes. These endolysosomal compartments have been named as Endosomal-LYSosomal organellar Assembly (ELYSA). ELYSAs increase in size as the oocytes undergo maturation. ELYSAs are distributed throughout the oocyte cytoplasm of GV stage immature oocytes but these structures become mostly cortical in the mature oocytes. Interestingly, they tend to avoid the region which contains metaphase II spindle and chromosomes. They show that the endolysosomal compartments in oocytes are less acidic and therefore non-degradative but their pH decreases and becomes degradative as the ELYSAs begin to disassemble in the embryos post-fertilization. This manuscript shows that lysosomal switching does not happen during oocyte development, and the formation of ELYSAs prevents lysosomes from being activated. Structures similar to these ELYSAs have been previously described in mouse oocytes (Zaffagnini et al, 2024) and these vesicular assemblies are important for sequestering protein aggregates in the oocytes but facilitate proteolysis after fertilization. The current manuscript, however, provides further details of endolysosomal disassembly post-fertilization. Specifically, the V1-subunit of V-ATPase targeting the ELYSAs increases the acidity of lysosomal compartments in the embryos. This is a well-conducted study and their model is supported by experimental evidence and data analyses.

    4. Reviewer #3 (Public Review):

      Fertilization converts a cell defined as an egg to a cell defined as an embryo. An essential component of this switch in cell fate is the degradation (autophagy) of cellular elements that serve a function in the development of the egg but could impede the development of the embryo. Here, the authors have focused on the behavior during the egg-to-embryo transition of endosomes and lysosomes, which are cytoplasmic structures that mediate autophagy. By carefully mapping and tracking the intracellular location of well-established marker proteins, the authors show that in oocytes endosomes and lysosomes aggregate into giant structures that they term Endosomal LYSosomal organellar Assembl[ies] (ELYSA). Both the size distribution of the ELYSAs and their position within the cell change during oocyte meiotic maturation and after fertilization. Notably, during maturation, there is a net actin-dependent movement towards the periphery of the oocyte. By the late 2-cell stage, the ELYSAs are beginning to disintegrate. At this stage, the endo-lysosomes become acidified, likely reflecting the activation of their function to degrade cellular components.

      This is a carefully performed and quantified study. The fluorescent images obtained using well-known markers, using both antibodies and tagged proteins, support the interpretations, and the quantification method is sophisticated and clearly explained. Notably, this type of quantification of confocal z-stack images is rarely performed and so represents a real strength of the study. It provides sound support for the conclusions regarding changes in the size and position of the ELYSAs. Another strength is the use of multiple markers, including those that indicate the activity state of the endo-lysosomes. Altogether, the manuscript provides convincing evidence for the existence of ELYSAs and also for regulated changes in their location and properties during oocyte maturation and the first few embryonic cell cycles following fertilization.

      At present, precisely how the changes in the location and properties of the ELYSAs affect the function of the endo-lysosomal system is not known. While the authors' proposal that they are stored in an inactive state is plausible, it remains speculative. Nonetheless, this study lays the foundation for future work to address this question.

      Minor point: l. 299. If I am not mistaken, there is a typo. It should read that the inhibitors of actin polymerization prevent redistribution from the cytoplasm to the cortex during maturation.<br /> Minor point: A few statements in the Introduction would benefit from clarification. These are noted in the comments to the authors.

    1. eLife assessment

      The study describes a valuable new technology in the field of targeted protein degradation that allows identification of E3-ubiquitin ligases that target a protein of interest. The presented data are convincing, however, it is unclear whether the proposed system can be successfully used in high throughput applications. This technology will serve the community in the initial stages of developing targeted protein degraders.

    2. Reviewer #1 (Public Review):

      Summary:

      PROTACs are heterobifunctional molecules that utilize the Ubiquitin Proteasome System to selectively degrade target proteins within cells. Upon introduction to the cells, PROTACs capture the activity of the E3 ubiquitin ligases for ubiquitination of the targeted protein, leading to its subsequent degradation by the proteasome. The main benefit of PROTAC technology is that it expands the "druggable proteome" and provides numerous possibilities for therapeutic use. However, there are also some difficulties, including the one addressed in this manuscript: identifying suitable target-E3 ligase pairs for successful degradation. Currently, only a few out of about 600 E3 ligases are used to develop PROTAC compounds, which creates the need to identify other E3 ligases that could be used in PROTAC synthesis. Testing the efficacy of PROTAC compounds has been limited to empirical tests, leading to lengthy and often failure-prone processes. This manuscript addressed the need for faster and more reliable assays to identify the compatible pairs of E3 ligases-target proteins. The authors propose using the RiPA assay, which depends on rapamycin-induced dimerization of FKBP12 protein with FRB domain. The PROTAC technology is advancing rapidly, making this manuscript both timely and essential. The RiPA assay might be useful in identifying novel E3 ligases that could be utilized in PROTAC technology. Additionally, it could be used at the initial stages of PROTAC development, looking for the best E3 ligase for the specific target.

      The authors described an elegant assay that is scalable, easy-to-use, and applicable to a wide range of cellular models. This method allows for the quantitative validation of the degradation efficacy of a given pair of E3 ligase-target proteins, using luciferase activity as a measure. Importantly, the assay also enables the measurement of kinetics in living cells, enhancing its practicality.

      Strengths:

      (1) The authors have addressed the crucial needs that arise during PROTAC development. In the introduction, they nicely describe the advantages and disadvantages of the PROTAC technology and explain why such an assay is needed.

      (2) The study includes essential controls in experiments (important for generating new assay), such as using the FRB vector without E3 ligase as a negative control, testing different linkers (which may influence the efficacy of the degradation), and creating and testing K-less vectors to exclude the possibility of luciferase or FKBP12 ubiquitination instead of WDR5 (the target protein). Additionally, the position of the luc in the FKBP12 vector and the position of VHL in the FRB vector are tested. Different E3 ligases are tested using previously identified target proteins, confirming the assay's utility and accuracy.

      (3) The study identified a "new" E3 ligase that is suitable for PROTAC technology (FBXL).

      Weaknesses:

      It is not clear how feasible it would be to adapt the assay for high-throughput screens. In some experiments, the efficacy of WDR5 degradation tested by immunoblotting appears to be lower than luciferase activity (e.g., Figure 2G and H).

    3. Reviewer #2 (Public Review):

      Summary:

      Adhikari and colleagues developed a new technique, rapamycin-induced proximity assay (RiPA), to identify E3-ubiquitin (ub) ligases of a protein target, aiming at identifying additional E3 ligases that could be targeted for PROTAC generation or ligases that may degrade a protein target. The study is timely, as expanding the landscape of E3-ub ligases for developing targeted degraders is a primary direction in the field.

      Strengths:

      The study's strength lies in its practical application of the FRB:FKBP12 system. This system is used to identify E3-ub ligases that would degrade a target of interest, as evidenced by the reduction in luminescence upon the addition of rapamycin. This approach effectively mimics the potential action of a PROTAC.

      Weaknesses:

      (1) While the technique shows promise, its application in a discovery setting, particularly for high-throughput or unbiased E3-ub ligase identification, may pose challenges. The authors should provide more detailed insights into these potential difficulties to foster a more comprehensive understanding of RiPA's limitations.

      (2) While RiPA will help identify E3 ligases, PROTAC design would still be empirical. The authors should discuss this limitation. Could the technology be applied to molecular glue generation?

      (3) Controls to verify the intended mechanism of action are missing, such as using a proteasome inhibitor or VHL inhibitors/siRNA to verify on-target effects. Verification of the target E3 ligase complex after rapamycin addition via orthogonal approaches, such as IP, should be considered.

      Minor concern:

      The graphs in Figure 1E are missing.

    1. eLife assessment

      This study combines extensive published and new datasets to provide a useful single-cell multi-omics analysis of early cardiac lineage segregation, highlighting the mutual regulation of key regulators for cardiac specification. While the data presentation is robust, the computational methods for delineating cardiac lineage trajectories and the functional analyses are incomplete and require further clarification and additional experiments. If validated, these findings will be of significant interest to researchers in the fields of cardiac development and congenital heart disease.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors identified and described the transcriptional trajectories leading to CMs during early mouse development, and characterized the epigenetic landscapes that underlie early mesodermal lineage specification.

      The authors identified two transcriptomic trajectories from a mesodermal population to cardiomyocytes, the MJH and PSH trajectories. These trajectories are relevant to the current model for the First Heart Field (FHF) and the Second Heart Field (SHF) differentiation. Then, the authors characterized both gene expression and enhancer activity of the MJH and PSH trajectories, using a multiomics analysis. They highlighted the role of Gata4, Hand1, Foxf1, and Tead4 in the specification of the MJH trajectory. Finally, they performed a focused analysis of the role of Hand1 and Foxf1 in the MJH trajectory, showing their mutual regulation and their requirement for cardiac lineage specification.

      Strengths:

      The authors performed an extensive transcriptional and epigenetic analysis of early cardiac lineage specification and differentiation which will be of interest to investigators in the field of cardiac development and congenital heart disease. The authors considered the impact of the loss of Hand1 and Foxf1 in-vitro and Hand1 in-vivo.

      Weaknesses:

      The authors used previously published scRNA-seq data to generate two described transcriptomic trajectories.

      (1) Details of the re-analysis step should be added, including a careful characterization of the different clusters and maker genes, more details on the WOT analysis, and details on the time stamp distribution along the different pseudotimes. These details would be important to allow readers to gain confidence that the two major trajectories identified are realistic interpretations of the input data.

      The authors have also renamed the cardiac trajectories/lineages, departing from the convention applied in hundreds of papers, making the interpretation of their results challenging.

      (2) The concept of "reverse reasoning" applied to the Waddington-OT package for directional mass transfer is not adequately explained. While the authors correctly acknowledged Waddington-OT's ability to model cell transitions from ancestors to descendants (using optimal transport theory), the justification for using a "reverse reasoning" approach is missing. Clarifying the rationale behind this strategy would be beneficial.

      (3) As the authors used the EEM cell cluster as a starting point to build the MJH trajectory, it's unclear whether this trajectory truly represents the cardiac differentiation trajectory of the FHF progenitors:<br /> - This strategy infers that the FHF progenitors are mixed in the same cluster as the extra-embryonic mesoderm, but no specific characterization of potential different cell populations included in this cluster was performed to confirm this.

      - The authors identified the EEM cluster as a Juxta-cardiac field, without showing the expression of the principal marker Mab21l2 per cluster and/or on UMAPs.

      - As the FHF progenitors arise earlier than the Juxta-cardiac field cells, it must be possible to identify an early FHF progenitor population (Nkx2-5+; Mab21l2-) using the time stamp. It would be more accurate to use this FHF cluster as a starting point than the EEM cluster to infer the FHF cardiac differentiation trajectory.

      These concerns call into question the overall veracity of the trajectory analysis, and in fact, the discrepancies with prior published heart field trajectories are noted but the authors fail to validate their new interpretation. Because their trajectories are followed for the remainder of the paper, many of the interpretations and claims in the paper may be misleading. For example, these trajectories are used subsequently for annotation of the multiomic data, but any errors in the initial trajectories could result in errors in multiomic annotation, etc, etc.

      (4) As mentioned in the discussion, the authors identified the MJH and PSH trajectories as non-overlapping. But, the authors did not discuss major previously published data showing that both FHF and SHF arise from a common transcriptomic progenitor state in the primitive streak (DOI: 10.1126/science.aao4174; DOI: 10.1007/s11886-022-01681-w). The authors should consider and discuss the specifics of why they obtained two completely separate trajectories from the beginning, how these observations conflict with prior published work, and what efforts they have made at validation.

      (5) Figures 1D and E are confusing, as it's unclear why the authors selected only cells at E7.0. Also, panels 1D 'Trajectory' and 'Pseudotime' suggest that the CM trajectory moves from the PSH cells to the MJH. This result is confusing, and the authors should explain this observation.

      (6) Regarding the PSH trajectory, it's unclear how the authors can obtain a full cardiac differentiation trajectory from the SHF progenitors as the SHF-derived cardiomyocytes are just starting to invade the heart tube at E8.5 (DOI: 10.7554/eLife.30668).

      The above notes some of the discrepancies between the author's trajectory analysis and the historical cardiac development literature. Overall, the discrepancies between the author's trajectory analysis and the historical cardiac development literature are glossed over and not adequately validated.

      (7) The authors mention analyzing "activated/inhibited genes" from Peng et al. 2019 but didn't specify when Peng's data was collected. Is it temporally relevant to the current study? How can "later stage" pathway enrichment be interpreted in the context of early-stage gene expression?

      (8) Motif enrichment: cluster-specific DAEs were analyzed for motifs, but the authors list specific TFs rather than TF families, which is all that motif enrichment can provide. The authors should either list TF families or state clearly that the specific TFs they list were not validated beyond motifs.

      (9) The core regulatory network is purely predictive. The authors again should refrain from language implying that the TFs in the CRN have any validated role.

      Regarding the in vivo analysis of Hand1 CKO embryos, Figures 6 and 7:

      (10) How can the authors explain the presence of a heart tube in the E9.5 Hand1 CKO embryos (Figure 6B) if, following the authors' model, the FHF/Juxta-cardiac field trajectory is disrupted by Hand1 CKO? A more detailed analysis of the cardiac phenotype of Hand1 CKO embryos would help to assess this question.

      (11) The cell proportion differences observed between Ctrl and Hand1 CKO in Figure 6D need to be replicated and an appropriate statistical analysis must be performed to definitely conclude the impact of Hand1 CKO on cell proportions.

      (12) The in-vitro cell differentiations are unlikely to recapitulate the complexity of the heart fields in-vivo, but they are analyzed and interpreted as if they do.

      (13) The schematic summary of Figure 7F is confusing and should be adjusted based on the following considerations:<br /> (a) the 'Wild-type' side presents 3 main trajectories (SHF, Early HT and JCF), but uses a 2-color code and the authors described only two trajectories everywhere else in the article (aka MJH and PSH). It's unclear how the SHF trajectory (blue line) can contribute to the Early HT, when the Early HT is supposed to be FHF-associated only (DOI: 10.7554/eLife.30668). As mentioned previously in Major comment 3., this model suggests a distinction between FHF and JCF trajectories, which is not investigated in the article.<br /> (b) the color code suggests that the MJH (FHF-related) trajectory will give rise to the right ventricle and outflow tract (green line), which is contrary to current knowledge.

      Minor comments:

      (1) How genes were selected to generate Figure 1F? Is this a list of top differentially expressed genes over each pseudotime and/or between pseudotimes?

      (2) Regarding Figure 1G, it's unclear how inhibited signaling can have an increased expression of underlying genes over pseudotimes. Can the authors give more details about this analysis and results?

      (3) How do the authors explain the visible Hand1 expression in Hand1 CKO in Figure S7C 'EEM markers'? Is this an expected expression in terms of RNA which is not converted into proteins?

      (4) The authors do not address the potential presence of doublets (merged cells) within their newly generated dataset. While they mention using "SCTransform" for normalization and artifact removal, it's unclear if doublet removal was explicitly performed.

    3. Reviewer #2 (Public Review):

      Summary of goals:

      The aims of the study were to identify new lineage trajectories for the cardiac lineages of the heart, and to use computational and cell and animal studies to identify and validate new gene regulatory mechanisms involved in these trajectories.

      Strengths:

      The study addresses the long-standing yet still not fully answered questions of what drives the earliest specification mechanisms of the heart lineages. The introduction demonstrates a good understanding of the relevant lineage trajectories that have been previously established, and the significance of the work is well described. The study takes advantage of several recently published data sets and attempts t use these in combination to uncover any new mechanisms underlying early mesoderm/cardiac specification mechanisms. A strength of the study is the use of an in vitro model system (mESCs) to assess the functional relevance of the key players identified in the computational analysis, including innovative technology such as CRISPR-guided enhancer modulations. Lastly, the study generates mesoderm-specific Hand1 LOF embryos and assesses the differentiation trajectories in these animals, which represents a strong complementary approach to the in vitro and computational analysis earlier in the paper. The manuscript is clearly written and the methods section is detailed and comprehensive.

      Comments and Weaknesses:

      Overall: The computational analysis presented here integrates a large number of published data sets with one new data point (E7.0 single cell ATAC and RNA sequencing). This represents an elegant approach to identifying new information using available data. However, the data presentation at times becomes rather confusing, and relatively strong statements and conclusions are made based on trajectory analysis or other inferred mechanisms while jumping from one data set to another. The cell and in vivo work on Hand1 and Foxf1 is an important part of the study. Some additional experiments in both of these model systems could strongly support the novel aspects that were identified by the computational studies leading into the work.

      (1) Definition of MJH and PSH trajectory:<br /> The study uses previously published data sets to identify two main new differentiation trajectories: the MJH and the PSH trajectory (Figure 1). A large majority of subsequent conclusions are based on in-depth analysis of these two trajectories. For this reason, the method used to identify these trajectories (WTO, which seems a highly biased analysis with many manually chosen set points) should be supported by other commonly used methods such as for example RNA velocity analysis. This would inspire some additional confidence that the MJH and PSH trajectories were chosen as unbiased and rigorous as possible and that any follow-up analysis is biologically relevant.

      (2) Identification of MJH and PSH trajectory progenitors:<br /> The study defines various mesoderm populations from the published data set (Figure 1A-E), including nascent mesoderm, mixed mesoderm, and extraembryonic mesoderm. It further assigns these mesoderm populations to the newly identified MJH/PSH trajectories. Based on the trajectory definition in Figure 1A it appears that both trajectories include all 3 mesoderm populations, albeit at different proportions and it seems thus challenging to assign these as unique progenitor populations for a distinct trajectory, as is done in the epigenetic study by comparing clusters 8 (MJH) and s (PSH)(Figure 2). Along similar lines, the epigenetic analysis of clusters 2 and 8 did not reveal any distinct differences in H3K4m1, H3K27ac, or H3K4me3 at any of the time points analyzed (Figure 2F). While conceptually very interesting, the data presented do not seem to identify any distinct temporal patterns or differences in clones 2 and 8 (Figure 2H), and thus don't support the conclusion as stated: "the combined transcriptome and chromatin accessibility analysis further supported the early lineage segregation of MJH and the epigenetic priming at gastrulation stage for early cardiac genes".

      (3) Function of Hand1 and Foxf1 during early cardiac differentiation:<br /> The study incorporated some functional studies by generating Hand1 and Foxf1 KO mESCs and differentiated them into mesoderm cells for RNA sequencing. These lines would present relevant tools to assess the role of Hand1 and Foxf1 in mesoderm formation, and a number of experiments would further support the conclusions, which are made for the most part on transcriptional analysis. For example, the study would benefit from quantification of mesoderm cells and subsequent cardiomyocytes during differentiation (via IF, or more quantitatively, via flow cytometry analysis). These data would help interpret any of the findings in the bulk RNAseq data, and help to assess the function of Hand1 and Foxf1 in generating the cardiac lineages. Conclusions such as "the analysis indicated that HAND1 and FOXF1 could dually regulate MJH specification through directly activating the MJH specific genes and inhibiting PSH specific genes" seem rather strong given the data currently provided.

      (4) Analysis of Hand1 cKO embryos:<br /> Adding a mouse model to support the computational analysis is a strong way to conclude the study. Given the availability of these early embryos, some of the findings could be strengthened by performing a similar analysis to Figure 7B&C and by including some of the specific EEM markers found to be differentially regulated to complement the structural analysis of the embryos.

      (5) Current findings in the context of previous findings:<br /> The introduction carefully introduces the concept of lineage specification and different progenitor pools. Given the enormous amount of knowledge already available on Hand1 and Foxf1, and their role in specific lineages of the early heart, some of this information should be added, ideally to the discussion where it can be put into context of what the present findings add to the existing understanding of these transcription factors and their role in early cardiac specification.

    4. Reviewer #3 (Public Review):

      (1) In Figure 1A, could the authors justify using E8.5 CMs as the endpoint for the second lineage and better clarify the chamber identities of the E8.5 CMs analysed? Why are the atrial genes in Figure 1C of the PSH trajectory not present in Table S1.1, which lists pseudotime-dependent genes for the MJH/PSH trajectories from Figure 1F?

      (2) Could the authors increase the resolution of their trajectory and genomic analyses to distinguish between the FHF (Tbx5+ HCN4+) and the JCF (Mab21l2+/ Hand1+) within the MJH lineage? Also, clarify if the early extraembryonic mesoderm contributes to the FHF.

      (3) The authors strongly assume that the juxta-cardiac field (JCF), defined by Mab21l2 expression at E7.5 in the extraembryonic mesoderm, contributes to CMs. Could the authors explain the evidence for this? Could the authors identify Mab21l2 expression in the left ventricle (LV) myocardium and septum transversum at E8.5 (see Saito et al., 2013, Biol Open, 2(8): 779-788)? If such a JCF contribution to CMs exists, the extent to which it influences heart development should be clarified or discussed.

      (4) Could the authors distinguish the Hand1+ pericardium from JCF progenitors in their single-cell data and explain why they excluded other cell types, such as the endocardium/endothelium and pericardium, or even the endoderm, as endpoints of their trajectory analysis? At the NM and MM mesoderm stages, how did the authors distinguish the earliest cardiac cells from the surrounding developing mesoderm?

      (5) Could the authors contrast their trajectory analysis with those of Lescroart et al. (2018), Zhang et al., Tyser et al., and Krup et al.?

      (6) Previous studies suggest that Mesp2 expression starts at E8 in the presomitic mesoderm (Saga et al., 1997). Could the authors provide in situ hybridization or HCR staining to confirm the early E7 Mesp2 expression suggested by the pseudo-time analysis of the second lineage.

      (7) Could the authors also confirm the complementary Hand1 and Lefty2 expression patterns at E7 using HCR or in situ hybridization? Hand1 expression in the first lineage is plausible, considering lineage tracing results from Zhang et al.

      (8) Could the authors explain why Hand1 and Lefty2+ cells are more likely to be multipotent progenitors, as mentioned in the text?

      (9) Could the authors comment on the low Mesp1 expression in the mesodermal cells (MM) of the MJH trajectory at E7 (Figure 1D)? Is Mesp1 transiently expressed early in MJH progenitors and then turned off by E7? Have all FHF/JCF/SHF cells expressed Mesp1?

      (10) Could the authors clarify if their analysis at E7 comprises a mixture of embryonic stages or a precisely defined embryonic stage for both the trajectory and epigenetic analyses? How do the authors know that cells of the second lineage are readily present in the E7 mesoderm they analysed (clusters 0, 1, and 2 for the multiomic analysis)?

      (11) Could the authors further comment on the active Notch signaling observed in the first and second lineages, considering that Notch's role in the early steps of endocardial lineage commitment, but not of CMs, during gastrulation has been previously described by Lescroart et al. (2018)?

      (12) In cluster 8, Figure 2D, it seems that levels of accessibility in cluster 8 are relatively high for genes associated with endothelium/endocardium development in addition to MJH genes. Could the authors comment and/or provide further analysis?

      (13) Can the authors clarify why they state that cluster 8 DAEs are primed before the full activation of their target genes, considering that Bmp4 and Hand1 peak activities seem to coincide with their gene expression in Figure 2G?

      (14) Did the authors extend the multiomic analysis to Nanog+ epiblast cells at E7 and investigate if cardiac/mesodermal priming exists before mesodermal induction (defined by T/Mesp1 onset of expression)?

      (15) In the absence of duplicates, it is impossible to statistically compare the proportions of mesodermal cell populations in Hand1 wild-type and knockout (KO) embryos or to assess for abnormal accumulation of PS, NM, and MM cells. Could the authors analyse the proportions of cells by careful imaging of Hand1 wild-type and KO embryos instead?

      (16) Could the authors provide high-resolution images for Figure 7 B-C-D as they are currently hard to interpret?

    1. eLife assessment

      This study represents a valuable addition to the catalog of mitochondrial proteins. With the use of methodology based on the bi-genomic split-GFP technology, the authors generate convincing data, including dually localized proteins and topological information, under various growth conditions in yeast. The study represents a starting point for further functional and/or mechanistic studies on mitochondrial protein biogenesis.

    2. Reviewer #1 (Public Review):

      Summary:

      The study conducted by the Shouldiner's group advances the understanding of mitochondrial biology through the utilization of their bi-genomic (BiG) split-GFP assay, which they had previously developed and reported. This research endeavors to consolidate the catalog of matrix and inner membrane mitochondrial proteins. In their approach, a genetic framework was employed wherein a GFP fragment (GFP1-10) is encoded within the mitochondrial genome. Subsequently, a collection of strains was created, with each strain expressing a distinct protein tagged with the GFP11 fragment. The reconstitution of GFP fluorescence occurs upon the import of the protein under examination into the mitochondria.

      Strengths:

      Notably, this assay was executed under six distinct conditions, facilitating the visualization of approximately 400 mitochondrial proteins. Remarkably, 50 proteins were conclusively assigned to mitochondria for the first time through this methodology. The strains developed and the extensive dataset generated in this study serve as a valuable resource for the comprehensive study of mitochondrial biology. Specifically, it provides a list of 50 "eclipsed" proteins whose role in mitochondria remains to be characterized.

      Weaknesses:

      The work could include some functional studies of at least one of the newly identified 50 proteins.

    3. Reviewer #2 (Public Review):

      The authors addressed the question of how mitochondrial proteins that are dually localized or only to a minor fraction localized to mitochondria can be visualized on the whole genome scale. For this, they used an established and previously published method called BiG split-GFP, in which GFP strands 1-10 are encoded in the mitochondrial DNA and fused the GFP11 strand C-terminally to the yeast ORFs using the C-SWAT library. The generated library was imaged under different growth and stress conditions and yielded positive mitochondrial localization for approximately 400 proteins. The strength of this method is the detection of proteins that are dually localized with only a minor fraction within mitochondria, which so far has hampered their visualization due to strong fluorescent signals from other cellular localizations. The weakness of this method is that due to the localization of the GFP1-10 in the mitochondrial matrix, only matrix proteins and IM proteins with their C-termini facing the matrix can be detected. Also, proteins that are assembled into multimeric complexes (which will be the case for probably a high number of matrix and inner membrane-localized proteins) resulting in the C-terminal GFP11 being buried are likely not detected as positive hits in this approach. Taking these limitations into consideration, the authors provide a new library that can help in the identification of eclipsed protein distribution within mitochondria, thus further increasing our knowledge of the complete mitochondrial proteome. The approach of global tagging of the yeast genome is the logical consequence after the successful establishment of the BiG split-GFP for mitochondria. The authors also propose that their approach can be applied to investigate the topology of inner membrane proteins, however, for this, the inherent issue remains that it cannot be excluded that even the small GFP11 tag can impact on protein biogenesis and topology. Thus, the approach will not overcome the need to assess protein topology analysis via biochemical approaches on endogenous untagged proteins.

    4. Reviewer #3 (Public Review):

      Summary:

      Here, Bykov et al move the bi-genomic split-GFP system they previously established to the genome-wide level in order to obtain a more comprehensive list of mitochondrial matrix and inner membrane proteins. In this very elegant split-GFP system, the longer GFP fragment, GFP1-10, is encoded in the mitochondrial genome and the shorter one, GFP11, is C-terminally attached to every protein encoded in the genome of yeast Saccharomyces cerevisiae. GFP fluorescence can therefore only be reconstituted if the C-terminus of the protein is present in the mitochondrial matrix, either as part of a soluble protein, a peripheral membrane protein, or an integral inner membrane protein. The system, combined with high-throughput fluorescence microscopy of yeast cells grown under six different conditions, enabled the authors to visualize ca. 400 mitochondrial proteins, 50 of which were not visualised before and 8 of which were not shown to be mitochondrial before. The system appears to be particularly well suited for analysis of dually localized proteins and could potentially be used to study sorting pathways of mitochondrial inner membrane proteins.

      Strengths:

      Many fluorescence-based genome-wide screens were previously performed in yeast and were central to revealing the subcellular location of a large fraction of yeast proteome. Nonetheless, these screens also showed that tagging with full-length fluorescent proteins (FP) can affect both the function and targeting of proteins. The strength of the system used in the current manuscript is that the shorter tag is beneficial for the detection of a number of proteins whose targeting and/or function is affected by tagging with full-length FPs.

      Furthermore, the system used here can nicely detect mitochondrial pools of dually localized proteins. It is especially useful when these pools are minor and their signals are therefore easily masked by the strong signals coming from the major, nonmitochondrial pools of the proteins.

      Weaknesses:

      My only concern is that the biological significance of the screen performed appears limited. The dataset obtained is largely in agreement with several previous proteomic screens but it is, unfortunately, not more comprehensive than them, rather the opposite. For proteins that were identified inside mitochondria for the first time here or were identified in an unexpected location within the organelle, it remains unclear whether these localizations represent some minor, missorted pools of proteins or are indeed functionally important fractions and/or productive translocation intermediates. The authors also allude to several potential applications of the system but do little to explore any of these directions.

    1. eLife assessment

      This important study examines the extent to which distinct developmental pathways that result in alternative morphs correlate with transcriptome differences in a marine annelid, Streblospio benedicti. The strengths of the study include the experimental design and dense temporal sampling, which together provide convincing evidence that the two morphs can be clearly distinguished at the transcriptome level, despite relatively modest overall differences. The work will be of particular interest to students of the evolution of development.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Overall, this study provides a meticulous comparison of developmental transcriptomes between two sub-species of the annelid Streblospio benedicti. Different lineages of S. benedicti maintain one of two genetically programmed alternative life histories, the ancestral planktotrophic or derived lecithotrophic forms of development. This contrast is also seen at the inter-species level in many marine invertebrate taxa, such as echinoderms and molluscs. The authors report relatively (surprisingly?) modest differences in transcriptomes overall, but also find some genes whose expression is essentially morph-specific (which they term "exclusive").

      Strengths:<br /> The study is based on dense and appropriately replicated sampling of early development. The tight clustering of each stage/morph combination in PCA space suggests the specimens were accurately categorized. The similar overall trajectories of the two morphs was surprising to me for two stage: 1) the earliest stage (16-cell), at which we might expect maternal differences due to the several-fold difference in zygote size, and 2) the latest stage (1-week), where there appears to be the most obvious morphological difference. This is why we need to do experiments!

      The examination of F1 hybrids was another major strength of the study. It also produced one of the most surprising results: though intermediate in phenotype, F1 embryos have the most distinct transcriptomes, and reveal a range of fixed, compensatory differences in the parental lines. Further, the F1 lack expression of nearly all transcripts identified as morph-specific in the pure parental lines. Since the F1 larvae present intermediate traits combining the features of both morphs, this implies that morph-specific transcripts are not actually necessary for morph-specific traits. This is interesting and somewhat counter to what one might naively expect.

      Weaknesses:<br /> Overall I really enjoyed this paper, and in its revised form it addresses some concerns I had in the first version. I still see a few places where it can be tightened and made more insightful.

    3. Reviewer #2 (Public Review):

      The manuscript by Harry and Zakas determined the extent to which gene expression differences contribute to developmental divergence by using a model that has two distinct developmental morphs within a single species. Although the authors did collect a valuable dataset and trends in differential expression between the two morphs of S. benedicti were presented, we found limitations about the methods, system, and resources that the authors should address.

      We have two major points:

      (1) Background information about the biological system needs to be clarified in the introduction of this manuscript. The authors stated that F1 offspring can have intermediate larval traits compared to the parents (Line 81). However, the authors collected F1 offspring at the same time as the mother in the cross. If offspring have intermediate larval traits, their developmental timeline might be different than both parents and necessitate the collection of offspring at different times to obtain the same stages as the parents. Could the authors (1) explain why they collected offspring at the same time as parents given that other literature and Line 81 state these F1 offspring develop at intermediate rates, and (2) add the F1 offspring to Figure 1 to show morphological and timeline differences in development?

      Additionally, the authors state (Lines 83-85) that they detail the full-time course of embryogenesis for both the parents and the F1 crosses. However, we do not see where the authors have reported the full-time course for embryogenesis of the F1 offspring. Providing this information would shape the remaining results of the manuscript.

      (2) We have several concerns about the S. benedicti genome and steps regarding the read mapping for RNA-seq:

      The S. benedicti genome used (Zakas et al. 2022) was generated using the PP morph. The largest scaffolds of this assembly correspond to linkage groups, showing the quality of this genome. The authors should point out in the Methods and/or Results sections that the quality of this genome means that PP-specific gene expression can be quantified well. However, the challenges and limitations of mapping LL-specific expression data to the PP genome should be discussed.

      It is possible that the authors did not find exclusive gene expression in the LL morph because they require at least one gene to be turned on in one morph as part of the data-cleaning criteria. Because the authors are comparing all genes to the PP morph, they could be missing true exclusive genes responsible for the biological differences between the two morphs. Did they make the decision to only count genes expressed in one stage of the other morph because the gene models and mapping quality led to too much noise?

      The authors state that the mapping rates between the two morphs are comparable (Supplementary Figure 1). However, there is a lot of variation in mapping the LL individuals (~20% to 43%) compared to the PP individuals. What is the level of differentiation within the two morphs in the species (pi and theta)? The statistical tests for this comparison should be added and the associated p-value should be reported. The statistical test used to compare mapping rates between the two morphs may be inappropriate. The authors used Salmon for their RNA alignment and differential expression analysis, but it is possible that a different method would be more appropriate. For example, Salmon has some limitations as compared to Kallisto as others have noted. The chosen statistical test should be explained, as well as how RNA-seq data are processed and interpreted.

      What about the read mapping rate and details for the F1 LP and PL individuals? How did the offspring map to the P genome? These details should be included in Supplementary Figure 1. Could the authors also provide information about the number of genes expressed at each stage in the F1 LP and PL samples in S Figure 2? How many genes went into the PCA? Many of these details are necessary to evaluate the F1 RNA-seq analyses.

      Generally, the authors need to report the statistics used in data processing more thoroughly. The authors need to report the statistics used to (1) process and evaluate the RNA-seq data and (2) determine the significance between the two morphs (Supplementary Figures 1 and 2).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Overall, this study provides a meticulous comparison of developmental transcriptomes between two sub-species of the annelid Streblospio benedicti. Different lineages of S. benedicti maintain one of two genetically programmed alternative life histories, the ancestral planktotrophic or derived lecithotrophic forms of development. This contrast is also seen at the inter-species level in many marine invertebrate taxa, such as echinoderms and molluscs. The authors report relatively (surprisingly?) modest differences in transcriptomes overall but also find some genes whose expression is essentially morph-specific (which they term "exclusive").

      Strengths:

      The study is based on a dense and appropriately replicated sampling of early development. The tight clustering of each stage/morph combination in PCA space suggests the specimens were accurately categorized. The similar overall trajectories of the two morphs were surprising to me for two stages: 1) the earliest stage (16-cell), at which we might expect maternal differences due to the several-fold difference in zygote size, and 2) the latest stage (1-week), where there appears to be the most obvious morphological difference. This is why we need to do experiments!

      The examination of F1 hybrids was another major strength of the study. It also produced one of the most surprising results: though intermediate in phenotype, F1 embryos have the most distinct transcriptomes, and reveal a range of fixed, compensatory differences in the parental lines.

      Weaknesses:

      Overall I really enjoyed this paper, but I see a few places where it can be tightened and made more insightful. These relate to better defining the basis for "exclusive" expression (regulation or gene presence/absence?), providing more examples of how specific genes related to trophic mode behave, and placing the study in the context of similar work in other phyla.

      As suggested, we changed the term “exclusive expression” to “morph-specific” expression throughout the paper to clarify which genes are only expressed in one morph. We also added references to similar work in other phyla such as recent work on lecithotrophic and planktotrophic development in species of Heliocidaris sea urchins in the 4th paragraph of the discussion. We added additional data about the F1 hybrids in “Gene expression of Genetic Crosses” section and the new Figure 8B. We find that gene expression in F1 offspring is divided between matching the maternal and paternal gene expression patterns, with slightly more genes matching paternal expression.

      Reviewer #2 (Public Review):

      The manuscript by Harry and Zakas determined the extent to which gene expression differences contribute to developmental divergence by using a model that has two distinct developmental morphs within a single species. Although the authors did collect a valuable dataset and trends in differential expression between the two morphs of S. benedicti were presented, we found limitations about the methods, system, and resources that the authors should address.

      We have two major points:

      (1) Background information about the biological system needs to be clarified in the introduction of this manuscript. The authors stated that F1 offspring can have intermediate larval traits compared to the parents (Line 81). However, the authors collected F1 offspring at the same time as the mother in the cross. If offspring have intermediate larval traits, their developmental timeline might be different than both parents and necessitate the collection of offspring at different times to obtain the same stages as the parents. Could the authors (1) explain why they collected offspring at the same time as parents given that other literature and Line 81 state these F1 offspring develop at intermediate rates, and (2) add the F1 offspring to Figure 1 to show morphological and timeline differences in development?

      Additionally, the authors state (Lines 83-85) that they detail the full-time course of embryogenesis for both the parents and the F1 crosses. However, we do not see where the authors have reported the full-time course for embryogenesis of the F1 offspring. Providing this information would shape the remaining results of the manuscript.

      (2) We have several concerns about the S. benedicti genome and steps regarding the read mapping for RNA-seq:

      The S. benedicti genome used (Zakas et al. 2022) was generated using the PP morph. The largest scaffolds of this assembly correspond to linkage groups, showing the quality of this genome. The authors should point out in the Methods and/or Results sections that the quality of this genome means that PP-specific gene expression can be quantified well. However, the challenges and limitations of mapping LL-specific expression data to the PP genome should be discussed.

      It is possible that the authors did not find exclusive gene expression in the LL morph because they require at least one gene to be turned on in one morph as part of the data-cleaning criteria. Because the authors are comparing all genes to the PP morph, they could be missing true exclusive genes responsible for the biological differences between the two morphs. Did they make the decision to only count genes expressed in one stage of the other morph because the gene models and mapping quality led to too much noise?

      The authors state that the mapping rates between the two morphs are comparable (Supplementary Figure 1). However, there is a lot of variation in mapping the LL individuals (~20% to 43%) compared to the PP individuals. What is the level of differentiation within the two morphs in the species (pi and theta)? The statistical tests for this comparison should be added and the associated p-value should be reported. The statistical test used to compare mapping rates between the two morphs may be inappropriate. The authors used Salmon for their RNA alignment and differential expression analysis, but it is possible that a different method would be more appropriate. For example, Salmon has some limitations as compared to Kallisto as others have noted. The chosen statistical test should be explained, as well as how RNA-seq data are processed and interpreted.

      What about the read mapping rate and details for the F1 LP and PL individuals? How did the offspring map to the P genome? These details should be included in Supplementary Figure 1. Could the authors also provide information about the number of genes expressed at each stage in the F1 LP and PL samples in S Figure 2? How many genes went into the PCA? Many of these details are necessary to evaluate the F1 RNA-seq analyses.

      Generally, the authors need to report the statistics used in data processing more thoroughly. The authors need to report the statistics used to (1) process and evaluate the RNA-seq data and (2) determine the significance between the two morphs (Supplementary Figures 1 and 2).

      (1) We clarified in the methods that F1 embryos are collected at the same stage (not absolute time) as the parental types. So the “16-cell” stage is comparable across planktotrophic, lecithotrophic and F1 offspring regardless of absolute time taken to reach that stage (which differs by ~3 hours- Figure 1).

      Figure 2A details every time point collected for all crosses. As mentioned in the methods, we were unable to collect two timepoints for one set of crosses (LP) due to limited tissue. However, we still cover the full development time from “16 cell” through “swimming larvae” stages, which is the full larval development time.

      (2) We appreciate the reviewer's concerns regarding the mapping to the reference genome. The S. benedicti genome is a largely complete and contiguous chromosome-length genome which we have now highlighted in the manuscript. However, the reference is only for the planktotrophic morph. So it is certainly possible that there could be mapping bias for lecithotrophic reads or F1 reads, as we point out in the discussion. While some bias is certainly possible, it is unlikely to be driving major differences in the results. We performed several tests to demonstrate this:

      (1) We conducted two-sided T-tests of the mapping rates between all sample groups in our dataset (PP, LL, PL, LP)  to determine if there were significant differences in mapping rates among the populations. No significant differences were found. The specific results of these statistical tests are included in the updated manuscript in supplementary figure 1 and are as follows:

      Author response table 1.

      (2) In response to the comment about sequence level divergence affecting mapping rate, we estimated pi (nucleotide diversity within a population) and dxy (genomic divergence between two populations) based on the sampled transcriptomic data of our Planktotrophic and Lecithotrophic populations. We used PIXY (Korunes, K.L. and Samuk, K., 2021) with its standard settings to estimate these values, with variant call files in bcf format produced with bcftools - one for all planktotrophic samples and one for all lecithotrophic samples in our dataset. We found that across regions of the transcriptome, the difference in pi between Planktotrophs and Lecithotrophs was between 0.11% and 4.2%. Genomic divergence across the transcriptome is also relatively minor: estimates of dxy ranged from 0.0049 to 0.0076. Given that these estimates show relatively modest differences in nucleotide diversity and overall sequence divergence, we maintain that it is unlikely that they significantly impact the results described in this study. From what we have seen in the literature, these values are not outside of other population studies that are mapping to a species reference derived from one population.

      We added the mapping rates of all samples in the Supplement (SFig. 1) as requested. We added the number of genes expressed at each stage in the Supplement (SFig. 2) as requested. We have also provided further details and figures (Fig 8B) on read mapping rates and statistics used in data processing, including those for F1 RNA-seq data.

    1. eLife assessment

      This study presents valuable findings on how the endocannabinoid system is involved in endometriosis progression using CNR1 and CNR2 knockout (KO) mouse models. The evidence supporting the authors' claims is incomplete; including bulk RNA-seq, flow cytometry, and imaging mass cytometry would have strengthened the study. This work might be of interest to medical scientists working on endometriosis.

    2. Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates both bulk and single-cell RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involves a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

    3. Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interstingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      L386: Role of CNR2 in T cells: Finding nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      Interpretation of the results is well-described in discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees L324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. Please add the results of the statistical analyses and analyzed sample numbers. L444-450 cannot be reviewed without them.

      This reviewer agrees L498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      The authors addressed all my concerns. I do not have any comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates both bulk and single-cell RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involve a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

      We appreciate the reviewer's thoughtful and constructive feedback. We agree that the additional measurements of lesion size/burden and histopathology would provide valuable insights into the specific contributions of CNR1 and CNR2 to endometriosis progression. However, the focus of this study was on assessing the alterations in complex immune microenvironment due to the absence of CNR1 and CNR2, given their close relation in regulating immune cell populations. We will plan to incorporate these measurements in future studies to further strengthen the understanding of the disease pathogenesis. Regarding the potential effects of global knockout, the reviewer raises a valid concern. To address this, we will explore cell and/or tissue-specific knockout models in future experiments to better isolate the direct effects of CNR1 and CNR2 on the disease process, while minimizing potential confounding factors from systemic alterations.

      Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and the microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interestingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions, but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      In line 386: Role of CNR2 in T cells. The finding that nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      The interpretation of the results is well-described in the Discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees with lines 324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. The results of the statistical analyses and analyzed sample numbers should be added. Lines 444-450 cannot be reviewed without them.

      This reviewer agrees with lines 498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      We would like to thank the reviewer for insightful comments, suggestions and acknowledging the importance of the work presented in this manuscript.

      Regarding 7-day time point, we have provided rationale in lines 479-481, but agree that it isn’t sufficient and hence we have provided additional details on the selection of the 7-day time point for the experiments in methods section (Mouse model of EM). We have also noted the suggestion on providing comparison of differentially expressed genes in the eutopic endometrium vs ectopic lesions. Since there are publications comparing the eutopic vs ectopic gene expression patterns (PMIDs: 33868805 and 18818281), including a study exploring the ECS genes in the endometrium throughout different menstrual cycles (PMID: 35672435), we believe additional analysis using the same dataset may not yield new information. However, we see the value in reviewer’s comment, and we will look at the gene expression patterns in the uterine vs endometriosis like lesions in our future studies with tissue or cell specific CNR1 and CNR2 knockout models to understand functional relevance of ECS in endometriosis initiation.

      Since the IMC study was exploratory for proof of concept, we did not have enough biological replicates for meaningful statistical validation (n = 2-3). We have clarified this information in the methods, results, and figure legends for appropriately representing the limitations of the current setup.

      Finally, we appreciate the feedback on the section discussing retrograded menstrual debris. Even though the menstrual debris may not be decidualized, some endometriotic lesions have the ability to decidualize based on their response to estrogen and progesterone in a cycling manner (PMID: 26450609), similar to the endometrium in the uterine cavity. We have clarified this in the revised MS.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      The mechanism of how alterations in ECS contribute to the observed cellular and molecular changes is unclear. Connecting CNR1 or CNR2 function to a specific cell type or cellular process would provide a more detailed understanding of how dysregulated ECS contributes to endometriosis pathogenesis.

      We agree that integrating the functions of CNR1 or CNR2 to specific cell types or cellular processes would strengthen the mechanistic insights presented in our study. This would help elucidate specific pathways by which dysregulated ECS leads to the alterations in immune cell populations, gene expression profiles, and other key aspects of endometriosis development and progression. This is a rapidly evolving field and at this stage, we do not have published information to reflect on this aspect in the revised manuscript.

      (1) As mentioned in the text, the ECS components being studied are widely expressed and may affect multiple aspects of endometriosis pathogenesis and symptomatology. However, the cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although these limitations are mentioned in the discussion, it is important to know if global CNR1 and CNR2 KO affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or if preexisting alterations in host or donor tissues affect lesion development in the surgically induced, syngeneic mouse model of endometriosis. This would also be the case in studies on immune system dysfunction or lesion microenvironment, as it is possible preexisting immune system dysfunction following CNR1 or CNR2 loss could alter the disease trajectory and lead to a misinterpretation of the findings. Some of these potential confounders could be addressed using crossover approaches in Figure 1A experimental design, but the donor tissues are reported to be matched to the recipients based on genotype.

      The reviewer raised an excellent point that the widespread expression of the ECS components studied in our manuscript may affect multiple aspects of endometriosis pathogenesis and symptomatology. Indeed, the cell or tissue-specific effects of CNR1 and CNR2 knockout are not fully incorporated into our experimental design, which could lead to potential confounding factors that may affect the interpretation of some of our findings. However, as outlined in our previous comments, we will incorporate the tissue/cell specific knockout, as well the crossover approaches to elucidate if the loss of CNR1 and CNR2 function is lesion driven in future studies. We agree that it is important to understand the impact of global CNR1 and CNR2 knockout on normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, and other potential preexisting alterations in the host or donor tissues that could influence lesion development in the syngeneic mouse model of endometriosis. As outlined in the MS (lines 59-62), there are studies highlighting pregnancy specific impact including implantation and impaired primary decidual zone formation. We did not find any baseline alterations in the systemic immune profiles between the CNR1 and CNR2 knockout mice and the WT mice without EM induction. However, the uterine environment has not been assessed to understand the baseline immune profile between the knockout mice and WT mice. We agree with the reviewer that, the possibility of preexisting immune system dysfunction following CNR1 or CNR2 loss could alter the disease trajectory related to immune system dysfunction or lesion microenvironment. We have highlighted this in the limitations section.

      (2) The phenotypic characterization of the endometriosis mouse model with or without CNR1 or CNR2 KO is very limited. To better understand how the observed cellular and molecular alterations correlate with endometriosis pathogenesis and severity CNR1 and CNR2 K/O mice, a detailed characterization of lesion size differences and histopathology should be made. Importantly, the histopathological characterization of the lesions would complement the imaging mass spectrometry findings.

      We agree that more detailed characterization of the endometriosis lesions in our CNR1 and CNR2 knockout mouse models are required. As evident for our several previous publications, we have focused on detailed histopathological characterization of endometriotic lesions in our syngeneic mouse model of endometriosis including a multiple time course study (Symons et al, 2020, FASEB). In the present investigation, we focused on cataloging spatial and transcriptomic changes as we do not currently have any information on the global influence of CNR1 and CNR2 knockout on endometriosis lesion microenvironment, since we prioritized this aspect, we were not able to provide detailed histological assessment of lesions. However, the IMC analysis provides a detailed, spatially resolved profile of the cellular composition and interactions within the endometriotic lesions, which we believe offers valuable insights into the mechanisms by which the dysregulated ECS may contribute to endometriosis pathogenesis. This quantitative, high-dimensional approach complements the transcriptional profiling and other analyses we have performed.

      (3) Given the effect sizes and variance observed with the ECS ligand measurements, an N = 4-5 biological samples for mouse phenotypic studies seems too low.

      The reviewer raises a valid point about low sample size. As elaborated earlier, this was a proof of principle study to capture biologically significant alterations within lesion and surrounding peritoneal microenvironment in the absence of CNR1, CNR2 receptors. This information is crucial for establishing the potential mechanisms by which the dysregulated ECS may contribute to the pathogenesis of endometriosis. Now that we have established the framework and baseline understanding of immune-inflammatory alterations, we will refine our future experimental approaches and include more samples if becomes necessary.

      Reviewer #2 (Recommendations For The Authors):

      It is hard to read the labeling of figures. Please increase the font size of each figure.

      We have increased the font size of the labels where necessary to improve the readability.

      Supplementary Data 1, Table 1 seems like Supplementary Table 1. Please use the same labeling of the Supplementary tables and figures to avoid confusion.

      We have updated the labeling accordingly and ensured that all supplementary tables and figures are consistently labeled.

      This reviewer suggests depositing RNA-seq and IMC data to NCBI etc. and listing the accession number in the MS.

      Thank you for your recommendation to deposit the RNA-seq and imaging mass cytometry (IMC) data from our study in public repositories such as NCBI. We appreciate your suggestion, as data sharing is an important aspect of scientific transparency and reproducibility. Bulk mRNA sequencing data has been attached as a supplementary file and IMC data has been deposited on Mendeley Data (DOI: 10.17632/2ptns5yhzh.1).

      Please clarify L363.

      We have clarified this in the revised MS. The revised text now reads: “However, we did not find the same differences (T cell-related genes) in the UnD lesions of CNR2 k/o mice. Moreover, UnD lesions of CNR2 k/o mice showed significantly low number of DEGs (11 compared to 65 in the DD lesions from CNR2 k/o mice) suggesting a decidualization dependent response (Supplementary Data 3).”

      Figure 7B: It is hard to see/understand the results in L438-440. It might be helpful if % is added to the figure.

      We have added more tick marks to the y-axis of Figure 7B to make it easier for the reader to interpret the percentages of the different cell types.

      Figure 7 legend: 2nd D should be G.

      We have revised the legend accordingly.

      Supplementary Figure 6: It seems immune cells are clustered in CN1, which is different from Figure 7. To easily understand Suppl Fig 6AB, please add some details in the legend.

      We have revised the legend as suggested.

      The revised legend now reads: “A, B Representative image of 8 distinct cell types from CN analysis of DD and UnD lesions from WT, CNR1 k/o, and CNR2 k/o mice, respectively. C Heatmap representation of CN analysis shows distinct clustering patterns observed in the UnD lesions among the different genotypes. The clustering reveals distinct spatial patterns of immune cell populations within the UnD lesions, which appear to differ from the observations in Figure 7G. This suggests potential spatial heterogeneity in the immune landscape of EM like lesions under conditions of decidualization.”

    1. eLife assessment

      This valuable study details an aspect of plant immunity where ATG6 was not previously known to have a role. The results suggest a direct relationship between ATG6 and NPR1, a well-studied salicylic acid receptor protein, which could be of interest to researchers studying the regulation of plant immunity. While the data presented are compelling, there are concerns about the interpretation of results, particularly regarding discrepancies in fluorescence and protein blot data. Addressing these issues would improve the overall impact of the work and consistency with prior studies.

    2. Reviewer #1 (Public Review):

      The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.

      The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      The authors have addressed most of my previous concerns.

    3. Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on revised version:

      The authors demonstrate the correlation between overexertion of atg6 and higher stability and activity of npr1. They claim a novel activity of atg6 in the nucleus.<br /> Overall, the experimental scope of the study is solid, however, the over-interpretation of the results substantially reduces the significance and value of this study for the target plant immunity readership.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study reports on a previously unrecognized function of ATG6 in plant immunity. The work is valuable because it proposes a direct interaction between ATG6 and a well-studied salicylic acid receptor protein, NPR1, which may interest researchers investigating plant immunity regulation. While the data presented are compelling, more information regarding the specificity of ATG6's role would improve the overall impact of the study, especially with an eye towards consistency with prior work.

      We also genuinely thank the editor and reviewers for the constructive and helpful suggestions and comments. These comments have greatly improved the quality and thoroughness of our manuscript. We have carefully studied these comments and have made the appropriate changes as far as possible. Additionally, some minor errors were also corrected during the revision process. New text is shown in blue in the revised manuscript. Our responses to the reviewer's comments are provided below each respective comment.

      Public Reviews:

      Reviewer #1 (Public Review):<br /> Summary:<br /> The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.<br /> Strengths:<br /> The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      Thank you very much for recognizing our research.

      Weaknesses:<br /> - The authors can do a few additional experiments to test the role of ATG6 in plant immunity.<br /> I recommend the authors to test the interaction between ATGs and other NPR1 homologs (such as NPR2).

      Thanks to your valuable feedback, it was discovered that the Arabidopsis NPRs family comprises six members: NPR1, NPR2, NPR3, NPR4, NPR5/PETIOLE 1 (BOP1), and NPR6/BOP2. NPR3/4 function in tandem as negative regulators to modulate SA signaling and plant immune responses (Ding et al., 2018). Similar to NPR1, NPR2 acts as a positive regulator of SA signaling (Castello et al., 2018). NPR5/BOP1 and NPR6/BOP2 primarily participate in the regulation of plant growth and development (McKim et al., 2008). This study specifically investigates the correlation between ATG6 and NPRs in plant resistance to pathogenic bacteria. Consequently, we experimentally confirmed the interaction between ATG6 and NPR1, NPR3, and NPR4 (Fig. 1 and Fig. S1 in the revised manuscript). It would be intriguing to further explore the interactions between ATG6 and other NPRs in the context of regulating plant growth and development in future research endeavors.

      -The concentration of SA used in the experiment (0.5-1 mM) seems pretty high. Does a lower concentration of SA induce ATG6 accumulation in the nucleus?

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). Consequently, to investigate the function of NPR1, many scientists and research groups typically employ higher concentrations of SA (e.g., 0.5 mM, 1 mM, or even 5 mM) to elucidate its role (Spoel et al., 2009; Fu et al., 2012; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a). In our study, we observed an interaction between ATG6 and NPR1. To enhance the detection of the NPR1 protein, we standardized the SA concentration (Arabidopsis was treated with 0.5 mM SA; Tobacco was treated with 1 mM SA) used in our experiments. Subsequently, we analyzed the nuclear accumulation ATG6 or NPR1 using a relatively high SA concentration (Arabidopsis was treated with 0.5 mM SA; Tobacco was treated with 1 mM SA), consistent with concentrations used in previous studies (Spoel et al., 2009; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a).

      -Does the silencing of ATG6 affect the cell death (or HR) triggered by AvrRPS4?

      Thank you for pointing this out. In this study, we examined changes in Pst DC3000/avrRps4-induced cell death in Col, amiRNAATG6 # 1, amiRNAATG6 # 2, npr1, NPR1-GFP, ATG6-mCherry and ATG6-mCherry × NPR1-GFP plants. The results of Taipan blue staining showed that Pst DC3000/avrRps4-induced cell death in npr1, amiRNAATG6 # 1 and amiRNAATG6 # 2 was significantly higher compared to Col (Fig. S15 in the revised manuscript). Conversely, Pst DC3000/avrRps4-induced cell death in ATG6-mCherry, NPR1-GFP and ATG6-mCherry × NPR1-GFP was significantly lower compared to Col. Notably, Pst DC3000/avrRps4-induced cell death in ATG6-mCherry × NPR1-GFP was significantly lower compared ATG6-mCherry and NPR1-GFP (Fig. S15 in the revised manuscript). These results suggest that ATG6 and NPR1 cooperatively inhibit Pst DC3000/avrRps4-induced cell dead. The relevant description can be found in lines 394-404 of the revised manuscript.

      -SA and NPR1 are also required for immunity and are activated by other NLRs (such as RPS2 and RPM1). Is ATG6 also involved in immunity activated by these NLRs?

      Thank you for your valuable comments. The most notable event in the NLR-mediated ETI immune response is the induction of hypersensitive response-programmed cell death (HR-PCD) (Jones and Dangl, 2006; Yuan et al., 2021). SA plays a dual role in the ETI response. On one hand, the accumulation of SA during the R gene-mediated ETI defense response is directly linked to the onset of HR-PCD (Nawrath and Metraux, 1999). SA and NPR1 can enhance the ETI response by regulating the expression of downstream target genes (Falk et al., 1999; Feys et al., 2001; Ding et al., 2018; Liu et al., 2020). On the other hand, the activation of SA signaling can have a negative regulatory effect on HR-PCD during the ETI response. High levels of SA have been shown to significantly inhibit HR-PCD triggered by the avrRpt2 effector (Rate and Greenberg, 2001; Devadas and Raina, 2002; Jurkowski et al., 2004). Rate et al. discovered that the inhibition of HR-PCD by SA relies on NPR1 (Rate and Greenberg, 2001).

      Arabidopsis AtATG6 or its homologs in other species (such as NbBECLIN1, TaATG6s, etc.) have been identified as positive regulators in plant immunity, playing a crucial role in inhibiting cell death and preventing invasion by pathogenic microorganisms (Liu et al., 2005; Patel and Dinesh-Kumar, 2008; Yue et al., 2015). Patel et al. demonstrated that, akin to autophagy-deficient mutants previously documented, AtATG6 antisense (AtATG6-AS) plants treated with Pst DC3000/avrRpm1 exhibited diffuse cell death, indicating the necessity of ATG6 in restricting cell death (Patel and Dinesh-Kumar, 2008). In tobacco, deficiencies in BECLIN 1 result in the onset of diffuse HR-PCD, underscoring the essential role of BECLIN 1 in limiting HR-PCD (Liu et al., 2005). Despite the genetic evidence supporting the critical function of ATG6 in plant immunity, the precise molecular mechanisms through which ATG6 impedes the invasion of pathogenic microorganisms remain elusive.

      In our study, we uncovered that ATG6 interacts with NPR1 to hinder pathogen invasion and inhibit the initiation of cell death. In animals, members of the NLR family have been observed to interact with the autophagy-related protein LC3 to inhibit the survival of pathogen (Zhang et al., 2019). Similar mechanisms may exist in plants. However, it remains to be explored whether NLR directly induces the activation of ATG6 through interaction or the relationship between NPR1-ATG6 interactions and NLR-mediated plant immunity, necessitating further investigation.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      However, the overall conclusions of the study are not well supported experimentally. The significance of the findings is low because of their mostly correlational nature, and lack of consistency with earlier reports on the same protein.

      Thank you for your valuable and constructive suggestions. In this article, we unveil a novel relationship in which ATG6 positively regulates NPR1 in plant immunity (Fig. 8 in the revised manuscript). ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and formation of SINCs-like condensates. This may be of interest to researchers studying the regulation of plant immunity. While there may be minor flaws in our current study, the significance of these findings cannot be overstated, as they have the potential to redirect scientific attention towards uncovering novel functions for autophagy genes.

      Based on the integrity and quality of the data as well as the depth of analysis, it is not yet clear if ATG6 is a specific regulator of NPR1 or if it is affecting NPR1's stability indirectly, through inducing an elevation of SA levels in plants. As such, the current study demonstrates a correlation between overexpression of ATG6, SA accumulation, and NPR1 stability, however, whether and how these components work together is not yet demonstrated.

      Thanks to your valuable feedback. Although as the reviewer said there may be some flaws in our data from the current results, scientific research is an ongoing process and I am confident that future studies will be even better. From the results given to us at the moment at least this study reports a previously undiscovered function of ATG6 in plant immunity. We propose a direct interaction between ATG6 and a well-studied salicylic acid receptor protein, NPR1. We unveil a novel relationship in which ATG6 positively regulates NPR1 in plant immunity (Fig. 8 in the revised manuscript). ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and formation of SINCs-like condensates. This may be of interest to researchers studying the regulation of plant immunity.

      Based on the provided biochemical data, it is not yet clear if the ATG6 functions specifically through NPR1 or through its paralogs NPR3 and NPR4, which are negative regulators of immunity. It is quite possible that interaction with NPR1 (or any NPR) is not the major regulatory step in the activity of ATG6 in plant immunity. The effect of ATG6 on NPR1 could well be indirect, through a change in the SA level and redox environment of the cell during the immune response. Both SA level and redox state of the cell were reported to induce accumulation of NPR1 in the nucleus and increase in stability.

      Thanks to your valuable feedback. In this study, we validated the interaction between ATG6 and NPR1 through various approaches and identified the key regions mediating their interaction. Our findings indicate that ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and the formation of SINC-like condensates. These results clearly demonstrate the involvement of ATG6 in the regulation of NPR1.Furthermore, we also found that ATG6 interacts with NPR3/4 (Fig. S1 in the revised manuscript). This is particularly relevant given that NPR3 and NPR4 have been shown to act as adaptors for the ubiquitin E3 ligase Cullin 3 (CUL3) to regulate the degradation of NPR1. Therefore, whether ATG6 regulates NPR1 through its interactions with NPR3/4 is an intriguing question worth exploring in future studies. We appreciate the reviewer's concerns and are committed to addressing them in our future research to further elucidate the complex regulatory mechanisms involving ATG6, NPR1, and other key players in plant immunity.

      Another major issue is the poor quality of the subcellular analyses. In contradiction to previous studies, ATG6 in this study is not localized to autophagosome puncta, which suggests that the soluble localization pattern presented here does not reflect the true localization of ATG6. Even if the authors propose a novel, non-canonical nuclear localization for ATG6, they still should have detected the canonical autophagy-like localization of this protein.

      Thanks to your valuable feedback. We conducted predictions at NLS Mapper (https://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi) and identified two bipartite NLSs in ATG6, with the sequences "MRKEEIPDKSRTIPIDPNLPKWVCQNCHHS" and "DPNLPKWVCQNCHHS LTIVGVDSYAGKFFNDP". To further elucidate the nuclear localization of ATG6, we introduced Agrobacterium tumefaciens carrying ATG6-GFP into nls-mCherry tobacco leaves through transient transformation. Subsequently, we observed the localization of ATG6-GFP, along with the canonical autophagy-like patterns. Our findings revealed fluorescence signals of ATG6-GFP in both the cytoplasm and nuclei (Figure 2b). The nuclear-localized ATG6-GFP overlapping with the nuclear-localized marker, nls-mCherry (indicated by white arrows). Additionally, we observed punctate patterns indicative of canonical autophagy-like localization of ATG6-GFP fluorescence signals (indicated by red circles). Based on these results, we are more confident about the authenticity of ATG6's nuclear localization. The revised manuscript includes clearer images to support our observations.

      Recommendations for the Authors:

      Reviewer #2 (Recommendations For The Authors):

      The duration and concentration of SA treatments are quite variable between experiments which makes comparisons difficult.

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). Consequently, to investigate the function of NPR1, many scientists and research groups typically employ higher concentrations of SA (e.g., 0.5 mM, 1 mM, or even 5 mM) to elucidate its role (Spoel et al., 2009; Fu et al., 2012; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a). In our study, we observed an interaction between ATG6 and NPR1. To enhance the detection of the NPR1 protein, we standardized the SA concentration used in our experiments. In this study, for the treatment of Arabidopsis, we followed the protocols outlined in Saleh et al. and Spoel et al., utilizing 0.5 mM SA (Spoel et al., 2009; Saleh et al., 2015). For tobacco treatment, we adopted the methodology described in the study by Zavaliev et al., administering 1 mM SA (Zavaliev et al., 2020).

      The methods section does not explain some of the essential experimental conditions and reagents used in the study.

      Thank you for pointing this out. Due to word limitations we have placed the detailed experimental methods and reagents in Supplemental Data 1. In Supplemental Data 1, we provide a comprehensive overview of the experimental flow and conditions employed in our study.

      Lines 62-63: the C-terminal domain of all NPRs has a name (already defined as SA-binding domain (SBD)). Also, it would be worth referring to the structure of NPR1 (Kumar et al 2022, Nat) as the source of information about its domains.

      Thank you for pointing this out, we have changed this description in the revised manuscript (lines 62-63).

      Lines 66-69: NPR1 doesn't form monomers. A recent study showed that the basic functional unit of NPR1 is a dimer (Kumar et al 2022, Nat).

      Thank you for pointing this out. In the revised manuscript (line 67) " monomers " has been changed to “dimer”.

      Lines 89-95 and elsewhere: the term "invasion" has a very specific meaning and it doesn't necessarily refer to disease. A pathogen can invade the plant but cause no disease (e.g. ETI). Most plant genetic immune mechanisms act after pathogen invasion, not before it. Those cited works reported the disease resistance, not the invasion resistance.

      Thank you for pointing this out. We've changed the incorrect description in the revised manuscript (line 91).

      Lines 113-119: the truncation at the aa328 includes half of the ANK domain (repeats 1 and 2), not just BTB. The C-terminal truncation variant contains the other half (repeats 3 and 4) of the ANK domain, not the entire ANK domain. It also contains the SBD, not just the NLS. So, this kind of analysis cannot determine the role of ANK domain in the interaction, nor it can conclusively determine if the interaction is through SBD. The interaction should be tested with the SBD domain only in order to make this conclusion.

      Thank you for pointing this out, we have removed the inappropriate description and made the appropriate changes in the revised manuscript (lines 114 and 115).

      In Figure S1, the equally strong interaction of atg6 is found for NPR3/NPR4. Does that mean that atg6 functions also through these other NPRs? What's the significance of these data compared to NPR1-ATG6 interaction? This is especially important, because both NPR3 and NPR4 are predominantly nuclear proteins, and they are unlikely to significantly overlap with autophagy components in the cytoplasm.

      NPR1 and its paralogues NPR3/NPR4, which frequently interact with other proteins to regulate plant immune responses (Backer et al., 2019; Chen et al., 2019). To identify ATGs that interact with NPRs, we performed yeast two-hybrid (Y2H) screens using NPRs as bait. Interestingly, ATG6 interacted with NPR1, NPR3 and NPR4, respectively, and different concentrations of SA treatment did not significantly affect their interaction (Fig. S1a). NPR1 is an important positive regulator of the plant immune response (Chen et al., 2021b). In Arabidopsis and N. benthamian, ATG6 or its homologues was reported to act as a positive regulator to enhance plant disease resistance to P. syringae pv. tomato (Pst) DC3000 and Pst DC3000/avrRpm1 bacteria (Patel and Dinesh-Kumar, 2008), N. benthamiana mosaic virus (TMV) (Liu et al., 2005). Therefore, in this study we focused on investigating the biological significance of the interaction between ATG6 and NPR1. Whether the interaction between ATG6 and NPR3/4 also has an effect on plant immunity is a question that remains to be explored in future studies.

      In Figure 1c and elsewhere: why not use the anti-mCherry antibody to detect atg6-mcherry? Are we seeing the correct protein band of atg6-mcherry? Also, it is not clear what antibodies they used throughout the study: the sources and specificities of antibodies are not provided.

      Thank you for pointing this out. We initially synthesized the ATG6 antibody (anti-ATG6, 1:200, peptide, C-KEKKKIEEEERK, Abmart) in order to detect the endogenous ATG6 protein, and we also tested the specificity and potency of the ATG6 antibody (results are shown in Fig. S17). Additionally, in order to determine the location of the ATG6-mCherry bands, we also detected ATG6-mCherry in ATG6-mCherry Arabidopsis using the ATG6 antibody, and we also used Col as a control (results are shown in Fig. S4). These results show that our synthesized ATG6 antibody can effectively and clearly immunize to both ATG6 and ATG6-mCherry. Therefore, in this study, we used the ATG6 antibody to analyze both ATG6-mCherry and endogenous ATG6. Detailed antibody information is presented in Supplementary Data 1, table S4

      In Figures 1d, 2a, and 2b, the subcellular localization pattern of atg6 contradicts what was published before (Fujiki et al 2007, Plant Phys; Liu et al 2018, FPlS; Xu et al 2017, Autophagy; Li et al 2018, Nat. Comm.). As an autophagy protein, atg6 was shown to localize to cytoplasmic puncta (autophagosomes), like atg8. No nuclear localization was found in those studies. The lack of puncta and the strong nuclear accumulation are signs that the localization of atg6 reported here has to be interpreted with caution. With the data provided, I am not convinced yet that we are looking at the correct ATG6 subcellular localization. Even if the authors propose a novel, non-canonical localization for atg6, they still should have detected the canonical autophagy-like localization of this protein.

      Thanks to your valuable feedback. To further elucidate the nuclear localization of ATG6, we introduced Agrobacterium tumefaciens carrying ATG6-GFP into nls-mCherry tobacco leaves through transient transformation. Subsequently, we observed the localization of ATG6-GFP, along with the canonical autophagy-like patterns. Our findings revealed fluorescence signals of ATG6-GFP in both the cytoplasm and nuclei (Figure 2b). The nuclear-localized ATG6-GFP overlapping with the nuclear-localized marker, nls-mCherry (indicated by white arrows). Additionally, we observed punctate patterns indicative of canonical autophagy-like localization of ATG6-GFP fluorescence signals (indicated by red circles). Based on these results, we are more confident about the authenticity of ATG6's nuclear localization. The revised manuscript includes clearer images to support our observations.

      It would make more sense to include the BiFC data (fig. S2) in the main figure, instead of the co-localization (fig. 1d) which cannot serve as evidence for interaction.

      Thank you for the feedback. We accept your suggestion. In Fig.1, we have replaced the co-localization image with a BiFC (Bimolecular Fluorescence Complementation) image to better illustrate the interaction.

      In Figure S2, the bifc signals have to be quantified to qualify as evidence for interaction. also, a subcellular marker has to be used (e.g. nuclear mcherry). From the current poor-quality images, one cannot determine where in the cell the presumed interaction takes place, nucleus or cytoplasm, or both. Also, no puncta are seen in these images.

      Thank you for pointing this out. Despite the lack of clarity in the images we provided, our BiFC results unequivocally demonstrate the interaction between ATG6 and NPR1 in both the cytoplasm and nucleus. Notably, as the reviewer pointed out, punctate signals were not observed in our images. This lack of punctate signals is consistent with previous studies (Figure 2) that have also shown BiFC results between autophagy-associated proteins ATG8s and their interacting partners. For instance, Fig 1G (Marshall et al. 2019, Cell), Fig 2F (Marshall et al. 2019, Cell), Fig 4B (Macharia et al. 2019, BMC Plant Biology), and Fig 3 (Zhou et al. 2018, Autophagy) all did not exhibit punctate signals, aligning closely with our findings.

      In Figure S3a, the nuclear localization is shown for stomata. It is known that stomata are especially strong expressors of the transgenes, and localization there could be an artefact of overaccumulation of the fusion protein. Also, why do they present the localization of atg6-gfp, if the analysis and the cross were made with atg6-mcherry?

      Thank you for pointing this out. In our previous experiments, we observed the localization of ATG6 in the nucleus of Arabidopsis thaliana plants overexpressing ATG6-GFP (Fig. S3a). To clearly visualize the location of the nucleus, we used the cytosolic DAPI dye, which readily stained the nuclei of the stomatal guard cells. This allowed us to easily identify the nuclear regions for our observations. Additionally, in Fig. 2a and Fig.S3b, we detected the fluorescence signal of ATG6-mCherry within the nucleus, further confirming the nuclear localization of ATG6. Moreover, the nuclear and cytoplasmic fractions were separated. Under SA treatment, ATG6-mCherry and ATG6-GFP were detected in the cytoplasmic and nuclear fractions in N. benthamiana (Fig. 2c and d). Similarly, ATG6 was also detected in the nuclear fraction of UBQ10::ATG6-GFP and UBQ10::ATG6-mCherry overexpressing plants (Fig. 2e and f).

      In Figure S3b, the images are low resolution and of poor quality. Why atg6-mcherry is expressed in a single cell if these are transgenic plants? The nuclear co-localization with npr1-gfp has to be shown more clearly with high res. images and also be quantified, because the expression of atg6-mcherry is not as uniform as npr1-gfp.

      Thank you for pointing this out. Contrary to the reviewer's assertion, the ATG6-mCherry fluorescence signal depicted in Figure S3b was not exclusive to a single cell. In fact, this fluorescence was also evident in other cells, albeit with relatively weaker intensity. This disparity in fluorescence intensity may be attributed to the irregularities in leaf structure at the time of image capture using the microscope. To bolster our conclusion, we further examined the fluorescence signals in the cells of the root elongation zone in ATG6-mCherry x NPR1-GFP, as depicted in the figure below. Our observations revealed that the fluorescence signals of ATG6-mCherry exhibited uniform distribution, with detection in both the cytoplasm and nucleus. We have replaced the original unclear image with a high-quality image.

      Lines 138-143: In fig. S3d, it would make more sense to show the WB on the hybrid npr1-gfp/atg6-mcherry plants with both anti-gfp and anti-mcherry antibodies to detect the free mcherry/gfp. Since the analysis of the level of free FP is done, then why didn't they test the free mcherry levels in Figure S4a? This would be more important than testing the free GFP in ATG6-GFP plants, because the imaging of atg6-mcherry was done in the hybrid plants (fig. S3b).

      Thank you for pointing this out. We initially synthesized the ATG6 antibody (anti-ATG6, 1:200, peptide, C-KEKKKIEEEERK, Abmart) in order to detect the endogenous ATG6 protein, and we also tested the specificity and potency of the ATG6 antibody (results are shown in Fig. S17). Additionally, in order to determine the location of the ATG6-mCherry bands, we also detected ATG6-mCherry in ATG6-mCherry Arabidopsis using the ATG6 antibody, and we also used Col as a control (results are shown in Fig. S4). These results show that our synthesized ATG6 antibody can effectively and clearly immunize to both ATG6 and ATG6-mCherry. Therefore, in this study, we used the ATG6 antibody to analyze both ATG6-mCherry and endogenous ATG6. Detailed antibody information is presented in Supplementary Data 1, table S4. In the previous experiments, we procured the mCherry antibody (mCherry-Tag Monoclonal Antibody(6B3), BD-PM2113, China) to immunolabel ATG6-mCherry. However, we encountered challenges with the potency of this mCherry antibody, and considering our budget constraints, as well as the availability of our self-synthesized ATG6 antibody, we chose not to pursue the purchase of another antibody from a different company for the continuation of the Western Blot experiment.

      In Figure 2c, there's no atg6-mcherry detected at time 0, in either cytoplasm or nucleus, yet the microscope images in panel a show strong accumulation in both compartments.

      Thank you for pointing this out. Previous studies ATG6 can also be degraded via the 26s proteasome pathway (Qi et al., 2017). We speculate that this phenomenon might be attributed to the rapid turnover of ATG6 at time 0.

      Lines 156-160: this statement is unsupported by the data. In fig. S5, the bands for native atg6 in the nuclear fraction are extremely weak, and they do not show the reverse pattern of change along the time points compared to the cytoplasmic fraction, which would indicate that the nuclear fraction is complementary to the cytoplasmic pool of the protein. The result more likely suggests that the majority of the ATG6 is in the cytoplasm, and that the weak bands detected in the nucleus are either background signal, or a contamination from the cytoplasmic pool. At this low protein level or poor immuno-detection the background signal is inevitable due to overexposure. Even though the actin marker is not detected in the nuclear fraction, it doesn't necessarily mean that there's no contamination from the cytoplasm in the nuclear fraction. The actin is just too abundant and can be detected at lower exposure.

      Thank you for pointing this out. In Fig. S5, we detected the subcellular localization of endogenous ATG6, although the image quality was somewhat low. Nevertheless, the cytosolic and nuclear localization of ATG6 could be clearly observed. In addition to this, we also verified the cytosolic and nuclear localization of ATG6 in Arabidopsis using confocal fluorescence microscopy and nucleoplasmic separation experiments. Actin and H3 were used as cytoplasmic and nucleus internal reference, respectively. (Fig. 2e and f). Furthermore, we observed the cytosolic and nuclear localization of ATG6 when we expressed ATG6-GFP or ATG6-mCherry in tobacco leaves through cis-transfection experiments (Fig. 2a-d). These results are consistent with the prediction of the subcellular location of ATG6 in the Arabidopsis subcellular database (https://suba.live/) (Fig. S3c). The reviewer's feedback has been valuable in helping us present these findings more clearly. We acknowledge the limitations in the image quality for the endogenous ATG6 localization, but we believe the combination of multiple experimental approaches, including the use of fluorescent protein fusions, provides robust evidence for the cytosolic localization of ATG6 in plant cells. Moving forward, we will continue to investigate the significance of ATG6's subcellular distribution and its potential dual roles in both the nucleus and the cytosol, particularly in the context of its interaction with the key immune regulator NPR1. We appreciate the reviewer's constructive comments, as they will help us strengthen the presentation and interpretation of our findings.

      In Figure 3a the images are of too low resolution to see the co-localization. The focal planes of the top and bottom panels are quite different: the top is focused on stomata, the bottom - on pavement cells. So, the number of the NPR1-GFP nuclei between these two focal planes is dramatically different. Also, it looks like the atg6-mcherry in these plants are predominantly in the cytoplasm, not the nucleus as the authors claim. A higher resolution and higher quality of images are required to determine this.

      Thank you for pointing this out. To ensure the clarity and accuracy of our confocal images, we have supplied a clearer image as supplementary evidence. The Bright images distinctly show that both sets of images are in the same plane of focus. Furthermore, in the figure (third one in the fourth column), the nucleus localization of ATG6-mCherry is clearly visible, and that ATG6-mCherry is co-localized with NPR1-GFP in the nucleus, as indicated by the white arrow.

      In Figure 3b, it is not indicated what exactly was measured and in what condition, mock or SA. If these are numbers of nuclei, then it should be indicated what size of the area was sampled, not just "section", and both mock and SA should be included in the measurements. Also, how many independent images have been sampled? what does the error bar represent? What does "normal" mean? Shouldn't this be a mock treatment?

      Thank you for pointing out this. The term "Normal" in this context refers to mock treatment, and we have revised the description for clarity. In Figure 3b, the graph illustrates the count of nuclear localizations of NPR1-GFP in ATG6-mCherry × NPR1-GFP and NPR1-GFP Arabidopsis plants following SA treatment. Statistical data were obtained from three independent experiments, each comprising five individual images, resulting in a total of 15 images analyzed for this comparison. Detailed descriptions were also added to the revised manuscript (Lines 568-570, 800-804).

      Lines 167-168: the proposed increase of NPR1-GFP in the nucleus could be simply due to a higher accumulation of SA in the hybrid plants, not because of the direct interaction of atg6.

      Thank you for pointing out this. Our results confirmed that ATG6 overexpression significantly increased nuclear accumulation of NPR1 (Fig. 3). Notably, the ratio (nucleus NPR1/total NPR1) in ATG6-mCherry × NPR1-GFP was not significantly different from that in NPR1-GFP, and there is a similar phenomenon in N. benthamiana (Fig. 3c-f). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 might result from higher levels and more stable NPR1, rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. Furthermore, we found that under SA treatment, the protein levels of NPR1 were significantly higher in the ATG6-mCherry × NPR1-GFP line compared to the NPR1-GFP line (Fig. 5a). Notably, even in the absence of differences in SA levels between the two lines, we observed that ATG6 could delay the degradation of NPR1 under normal conditions (Fig. 6). These findings suggest that ATG6 employs both SA-dependent and SA-independent mechanisms to maintain the stability of the key immune regulator NPR1. In summary, we therefore suggest that the increased nuclear accumulation in NPR1 cells is a dual effect of SA and ATG6.

      Lines 202-204: "Increased nuclear accumulation" implies increased translocation. However, they found that the ratio of NPR1-GFP does not change (Figure 3), so the reason for higher nuclear accumulation is not translocation, but abundance.

      Thank you for pointing out this. Our results confirmed that ATG6 overexpression significantly increased nuclear accumulation of NPR1 (Fig. 3). ATG6 also increases NPR1 protein levels and improves NPR1 stability (Fig. 5 and 6). Therefore, we consider that the increased nuclear accumulation of NPR1 in ATG6-mCherry x NPR1-GFP plants might result from higher levels and more stable NPR1 rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. To verify this possibility, we determined the ratio of NPR1-GFP in the nuclear localization versus total NPR1-GFP. Notably, the ratio (nucleus NPR1/total NPR1) in ATG6-mCherry × NPR1-GFP was not significantly different from that in NPR1-GFP, and there is a similar phenomenon in N. benthamiana (Fig. 3c-f). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 might result from higher levels and more stable NPR1, rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. Further we analyzed whether ATG6 affects NPR1 protein levels and protein stability. Our results show that ATG6 increases NPR1 protein levels under SA treatment and ATG6 maintains the protein stability of NPR1 (Fig. 5 and 6). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 result from higher levels and more stable NPR1. The corresponding description is shown in revised manuscript (lines 338~352).

      Lines 204-205: the co-localization in Figure 1d cannot be interpreted as interaction.

      Thank you for the feedback. We have replaced the co-localization image with a BiFC (Bimolecular Fluorescence Complementation) image to better illustrate the interaction in Fig 1d.

      What age of plants were used for the analysis in Figures 4 and S7? The age of the plant might significantly affect the free SA levels under control conditions.

      Thank you for the feedback. In Figures 4 and S7, 3-week-old plants were used to determine salicylic acid (SA) levels and the expression of target genes. Figures 4 and S7 figure notes provide detailed descriptions (lines 818-819).

      In Figure 5a they treat with SA, but the analysis in Figure S10 is done with the pathogen, so how can these data be correlated?

      Thank you for pointing out this. Previous studies have demonstrated that pathogen infestation rapidly increases the salicylic acid (SA) content in plants, and the elevated SA then activates plant immune responses. Therefore, both pathogen treatment and direct SA treatment can activate SA-dependent plant immune responses. The NPR1 protein is known for its instability. In Figure 5a, we utilized a 0.5 mM SA treatment to assess the changes in NPR1 protein levels, as the impact of SA treatment is more immediate and pronounced.

      Lines 241-242: In Figure 5b, it is not clear why there's no detection of NPR1-GFP and atg6-mcherry at time 0?? The levels of proteins in the transient assay are sufficiently high for detection by WB.

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). In addition, previous studies ATG6 can also be degraded via the 26s proteasome pathway (Qi et al., 2017). We speculate that this phenomenon might be attributed to the rapid turnover of NPR1 and ATG6 at time 0.

      In Figures 5c-d, the quality of these images is very poor, and they do not clearly show the signs. What structure was exactly measured in these images? There are so many fluorescent bodies there, that it is not clear what are we looking at. Also, it is not clear why they did not show the mcherry channel? It would be important to see if the bodies in SA-treated plants show co-localization with atg6-mcherry autophagosomes (if these exist at all).

      Thank you for pointing this out. Interestingly, similar to previous reports (Zavaliev et al., 2020), SA promoted the translocation of NPR1 into the nucleus, but still a significant amount of NPR1 was present in the cytoplasm (Fig. 3c and e). Previous studies have shown that SA increased NPR1 protein levels and facilitated the formation of SINCs in the cytoplasm, which are known to promote cell survival (Zavaliev et al., 2020). We therefore observed the fluorescence signal of SINCs-like condensates in the cytoplasm of tobacco leaves. After 1mM SA treatment, more SINCs-like condensates fluorescence were observed in N. benthamiana co-transformed with ATG6-mCherry + NPR1-GFP compared to mCherry + NPR1-GFP (Fig. 5c-d and Supplemental movie 1-2). We have a clearer demonstration in the supplemental video movie 1-2. Additionally, we observed that SINCs-like condensates signaling partial co-localized with certain ATG6-mCherry autophagosomes fluorescence signals.

      Lines 245-247: so, is it atg6 or SA that increases the NPR1 levels? If this is due to SA, then the whole study doesn't have novelty, because we already know from previous works that SA increases the stability of npr1.

      Thank you for pointing this out. Indeed, previous studies have shown that salicylic acid (SA) increases NPR1 levels and protein stability (Spoel et al., 2009; Saleh et al., 2015). In our experiments, we found that under SA treatment, the protein levels of NPR1 were significantly higher in the ATG6-mCherry × NPR1-GFP line compared to the NPR1-GFP line (Fig. 5a). Additionally, free SA levels were also significantly elevated in the ATG6-mCherry × NPR1-GFP line under pathogen challenge (Pst DC3000/avrRps4), but not under normal conditions (Fig. 4a). Furthermore, even in the absence of differences in SA levels between the two lines, we observed that ATG6 could delay the degradation of NPR1 under normal conditions (Fig. 6). These findings represent one of our new discoveries. These findings suggest that ATG6 employs both SA-dependent and SA-independent mechanisms to maintain the stability of the key immune regulator NPR1.

      Lines 313-316: npr1 and atg6 can function independently from each other, so the term "jointly" is misleading. Based on the overall data provided in this manuscript it cannot be concluded that the two proteins work in one complex to control plant immunity.

      Thank you for pointing this out. In the revised manuscript "jointly" has been changed to “cooperatively”.

      Lines 369-374: this speculation is beyond the main hypothesis claiming that atg6 functions through npr1. If atg6 can activate the transcription alone, then what is the significance of its activation of npr1? How can one distinguish between the two?

      Thank you for pointing this out. Transcription activation by transcription factors typically requires at least two conserved structural domains: a transcription activation domain and a DNA-binding domain. However, ATG6 does not possess these two typical conserved structural domains found in canonical transcription factors. Given this structural context, it is unlikely that ATG6 would be able to directly activate transcription on its own. The lack of the canonical transcription factor domains in ATG6 suggests that it may not be able to function as a direct transcriptional activator. Previous studies have shown that acidic activation domains (AADs) in transcriptional activators (such as Gal4, Gcn4 and VP16) play important roles in activating downstream target genes. Acidic amino acids and hydrophobic residues are the key structural elements of AAD (Pennica et al., 1984; Cress and Triezenberg, 1991; Van Hoy et al., 1993). Chen et al. found that EDS1 contains two ADD domains and confirmed that EDS1 is a transcriptional activator with AAD (Chen et al., 2021a). Here, we also have similar results that ATG6 overexpression significantly enhanced the expression of PR1 and PR5 (Fig. 4b-c and S9), and that the ADD domain containing acidic and hydrophobic amino acids is also found in ATG6 (148-295 AA) (Fig. S14). We speculate that ATG6 might act as a transcriptional coactivator to activate PRs expression synergistically with NPR1.

      Lines 389-400: the cell death due to AvrRPS4 in Col-0 ecotype is extremely weak as there's no complete receptor complex for this effector. So, one has to use a very high dose to induce cell death in Col-0, certainly higher than the one used for bacterial growth. The authors used the same dose in both assays, so it is likely that what we see as "cell death" is not an effector-triggered response, but rather symptom-associated for the virulent pathogen.

      Thank you for pointing this out. Indeed, as the reviewer pointed out, most cell death assays use higher concentrations of Pst DC3000/avrRps4 or Pst DC3000/avrRpt2, but they typically treat Arabidopsis for a relatively short period, usually less than 1 day(Hofius et al., 2009; Zavaliev et al., 2020). In this study, although we used relatively low Pst DC3000/avrRps4 (0.001) injections, we detected cell death under a relatively long period of Pst DC3000/avrRps4 infestation (3 days). Pst DC3000/avrRps4-infested plants multiply significantly in host cells, and therefore we assumed that the propagated pathogens after 3 days of incubation would be sufficient to induce intense cell death. Consequently, we chose this concentration of Pst DC3000/avrRps4 for the experiment.

      Lines 407-416: why do you expect "delay of degradation" with autophagy inhibitor? Shouldn't it be the opposite? In Figure S14, if we compare the bands between 120min and 120min+ConA+WM, the effect of autophagy inhibitors is actually quite strong (0.47 vs 0.22), with about 50% more degradation of NPR1 in their presence. So, the conclusion that the degradation of NPR1 is autophagy-independent is wrong according to this result.

      Thank you for pointing this out. We have revised the inaccurate description, as outlined in the revised manuscript (lines 413-425).

      References

      Backer R, Naidoo S, van den Berg N. 2019. The NONEXPRESSOR OF PATHOGENESIS-RELATED GENES 1 (NPR1) and Related Family: Mechanistic Insights in Plant Disease Resistance. Front Plant Sci 10, 102.

      Castello MJ, Medina-Puche L, Lamilla J, et al. 2018. NPR1 paralogs of Arabidopsis and their role in salicylic acid perception. PLoS One 13, e0209835.

      Chen H, Li M, Qi G, et al. 2021a. Two interacting transcriptional coactivators cooperatively control plant immune responses. Sci Adv 7, eabl7173.

      Chen J, Mohan R, Zhang Y, et al. 2019. NPR1 Promotes Its Own and Target Gene Expression in Plant Defense by Recruiting CDK8. Plant Physiol 181, 289-304.

      Chen J, Zhang J, Kong M, et al. 2021b. More stories to tell: NONEXPRESSOR OF PATHOGENESIS-RELATED GENES1, a salicylic acid receptor. Plant Cell Environ.

      Cress WD, Triezenberg SJ. 1991. Critical structural elements of the VP16 transcriptional activation domain. Science 251, 87-90.

      Devadas SK, Raina R. 2002. Preexisting systemic acquired resistance suppresses hypersensitive response-associated cell death in Arabidopsis hrl1 mutant. Plant Physiol 128, 1234-1244.

      Ding Y, Sun T, Ao K, et al. 2018. Opposite Roles of Salicylic Acid Receptors NPR1 and NPR3/NPR4 in Transcriptional Regulation of Plant Immunity. Cell 173, 1454-1467 e1415.

      Falk A, Feys BJ, Frost LN, et al. 1999. EDS1, an essential component of R gene-mediated disease resistance in Arabidopsis has homology to eukaryotic lipases. Proc Natl Acad Sci U S A 96, 3292-3297.

      Feys BJ, Moisan LJ, Newman MA, et al. 2001. Direct interaction between the Arabidopsis disease resistance signaling proteins, EDS1 and PAD4. EMBO J 20, 5400-5411.

      Fu ZQ, Yan S, Saleh A, et al. 2012. NPR3 and NPR4 are receptors for the immune signal salicylic acid in plants. Nature 486, 228-232.

      Hofius D, Schultz-Larsen T, Joensen J, et al. 2009. Autophagic components contribute to hypersensitive cell death in Arabidopsis. Cell 137, 773-783.

      Jones JD, Dangl JL. 2006. The plant immune system. Nature 444, 323-329.

      Jurkowski GI, Smith RK, Jr., Yu IC, et al. 2004. Arabidopsis DND2, a second cyclic nucleotide-gated ion channel gene for which mutation causes the "defense, no death" phenotype. Mol Plant Microbe Interact 17, 511-520.

      Lee HJ, Park YJ, Seo PJ, et al. 2015. Systemic Immunity Requires SnRK2.8-Mediated Nuclear Import of NPR1 in Arabidopsis. Plant Cell 27, 3425-3438.

      Liu Y, Schiff M, Czymmek K, et al. 2005. Autophagy regulates programmed cell death during the plant innate immune response. Cell 121, 567-577.

      Liu Y, Sun T, Sun Y, et al. 2020. Diverse Roles of the Salicylic Acid Receptors NPR1 and NPR3/NPR4 in Plant Immunity. Plant Cell 32, 4002-4016.

      McKim SM, Stenvik GE, Butenko MA, et al. 2008. The BLADE-ON-PETIOLE genes are essential for abscission zone formation in Arabidopsis. Development 135, 1537-1546.

      Nawrath C, Metraux JP. 1999. Salicylic acid induction-deficient mutants of Arabidopsis express PR-2 and PR-5 and accumulate high levels of camalexin after pathogen inoculation. Plant Cell 11, 1393-1404.

      Patel S, Dinesh-Kumar SP. 2008. Arabidopsis ATG6 is required to limit the pathogen-associated cell death response. Autophagy 4, 20-27.

      Pennica D, Goeddel DV, Hayflick JS, et al. 1984. The amino acid sequence of murine p53 determined from a c-DNA clone. Virology 134, 477-482.

      Qi H, Xia FN, Xie LJ, et al. 2017. TRAF Family Proteins Regulate Autophagy Dynamics by Modulating AUTOPHAGY PROTEIN6 Stability in Arabidopsis. Plant Cell 29, 890-911.

      Rate DN, Greenberg JT. 2001. The Arabidopsis aberrant growth and death2 mutant shows resistance to Pseudomonas syringae and reveals a role for NPR1 in suppressing hypersensitive cell death. Plant J 27, 203-211.

      Saleh A, Withers J, Mohan R, et al. 2015. Posttranslational Modifications of the Master Transcriptional Regulator NPR1 Enable Dynamic but Tight Control of Plant Immune Responses. Cell Host Microbe 18, 169-182.

      Skelly MJ, Furniss JJ, Grey H, et al. 2019. Dynamic ubiquitination determines transcriptional activity of the plant immune coactivator NPR1. Elife 8.

      Spoel SH, Mou Z, Tada Y, et al. 2009. Proteasome-mediated turnover of the transcription coactivator NPR1 plays dual roles in regulating plant immunity. Cell 137, 860-872.

      Van Hoy M, Leuther KK, Kodadek T, et al. 1993. The acidic activation domains of the GCN4 and GAL4 proteins are not alpha helical but form beta sheets. Cell 72, 587-594.

      Yuan M, Ngou BPM, Ding P, et al. 2021. PTI-ETI crosstalk: an integrative view of plant immunity. Curr Opin Plant Biol 62, 102030.

      Yue J, Sun H, Zhang W, et al. 2015. Wheat homologs of yeast ATG6 function in autophagy and are implicated in powdery mildew immunity. BMC Plant Biol 15, 95.

      Zavaliev R, Mohan R, Chen T, et al. 2020. Formation of NPR1 Condensates Promotes Cell Survival during the Plant Immune Response. Cell 182, 1093-1108 e1018.

    1. eLife assessment

      This valuable manuscript systematically addresses the role of intracellular lipid transfer proteins on cellular lipid levels. It provides convincing evidence on the role of ORP9 and ORP11 in sphingolipid metabolism at the Golgi complex. This article will be of broad interest to cell biologists interested in lipid metabolism and membrane biology.

    2. Reviewer #1 (Public Review):

      Summary:

      In this well-designed study, the authors of the manuscript have analyzed the impact of individually silencing 90 lipid transfer proteins on the overall lipid composition of a specific cell type. They confirmed some of the evidence obtained by their own and other research groups in the past, and additionally, they identified an unreported role for ORP9-ORP11 in sphingomyelin production at the trans-Golgi. As they delved into the nature of this effect, the authors discovered that ORP9 and ORP11 form a dimer through a helical region positioned between their PH and ORD domains.

      Strengths:

      This well-designed study presents compelling new evidence regarding the role of lipid transfer proteins in controlling lipid metabolism. The discovery of ORP9 and ORP11's involvement in sphingolipid metabolism invites further investigation into the impact of the membrane environment on sphingomyelin synthase activity.

      Weaknesses:

      There are a couple of weaknesses evident in this manuscript. Firstly, there's a lack of mechanistic understanding regarding the regulatory role of ORP9-11 in sphingomyelin synthase activity. Secondly, the broader role of hetero-dimerization of LTPs at ER-Golgi membrane contact sites is not thoroughly addressed. The emerging theme of LTP dimerization through coiled domains has been reported for proteins such as CERT, OSBP, ORP9, and ORP10. However, the specific ways in which these LTPs hetero and/or homo-dimerize and how this impacts lipid fluxes at ER-Golgi membrane contact sites remain to be fully understood.

      Regardless of the unresolved points mentioned above, this manuscript presents a valuable conceptual advancement in the study of the impact of lipid transfer on overall lipid metabolism. Moreover, it encourages further exploration of the interplay among LTP actions across various cellular organelles.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine which lipid transfer proteins impact the lipids of Golgi apparatus, and they identified a reasonable number of "hits" where the lack of one lipid transfer protein affected a particular Golgi lipid or class of lipids. They then carried out something close to a "proof of concept" for one lipid (sphingomyelin) and two closely related lipid transfer proteins (ORP9/ORP11). They looked into that example in great detail and found a previous unknown relationship between the level of phosphatidylserine in the Golgi (presumably trans-Golgi, trans-Golgi Network) and function of the sphingomyelin synthase enzyme. This was all convincingly done - results support their conclusions - showing that the authors achieved their aims.

      Impact:

      There are likely to be 2 types of impact:

      (I) cell biology: sphoingomyelin synthase, ORP9/11 will be studied in future in more informed ways to understand (a) the role of different Golgi lipids - this work opens that out and produces a to more questions than answers (b) the role of different ORPs: what distinguishes ORP11 from its paralogy ORP10?

      (ii) molecular biochemistry: combining knockdown miniscreen with organelle lipidomics must be time-consuming, but here it is shown to be quite a powerful way to discover new aspects of lipid-based regulation of protein function. This will be useful to others as an example, and if this kind of workflow could be automated, then the possible power of the method could be widely applied.

      Strengths:

      Nicely controlled data;

      Wide-ranging lipidomics dataset with repeats and SDs - all data easily viewed.

      Simple take home message that PS traffic to the TGN by ORP9/11 is required for some aspect of SMS1 function.

      Weaknesses:

      Model and Discussion:

      Despite the authors saying that this has been addressed in their rebuttal, I still struggle to find any ideas about the aspect of SMS1 function that is being affected.

      As I mentioned before, even if no further experiments were carried out the authors could discuss possibilities. one might speculate what the PS is being used for. For example, is it a co-factor for integral membrane proteins, such as flippases? Is it a co-factor for peripheral membrane proteins, such as yet more LTPs? The model could include the work of Peretti et al (2008), which linked Nir2 activity exchanging PI:PA (Yadav et al, 2015) to the eventual function of CERT. Could the PS have a role in removing/reducing DAG produced by CERT?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The authors should possibly discuss more the other cases when LTPs of the same type of ORP9 and ORP10 have been found to dimerise. They should definitely cite and discuss the evidence reported in February this year in CMLS (see https://link.springer.com/article/10.1007/s00018-023-04728-5). In this paper, authors reported very similar findings as those the authors have in Figures 3, 4, S6, S7, and S8. Specifically, in this CMLS paper the authors find that ORP9 and ORP10 (not ORP11) interact through a central helical region and that ORP9 localises ORP10 to the ER-Golgi MCSs by providing ORP10 with a binding site for VAPs, where the heterodimer mediates the exchange of PtdIns(4)P for PtdSer. 

      We thank the reviewer for their recommendations. The mentioned paper has simply gone unnoticed by us and is now referred in the revised manuscript. Various other papers reporting on LTP dimerizations are already cited in our manuscript: ORP9-ORP10 dimerization (Kawasaki et al. 2022), ORP9-ORP11 dimerization (Zhou et al. 2010), and ORP9-ORP10/11 dimerization (Tan and Finkel 2022). Revised manuscript now discusses the dimerization of CERT and OSBP while citing Gehin et al. 2023, Ridgway et al. 1992 and de la Mora et al. 2021.

      Reviewer #2 (Recommendations For The Authors): 

      Model and Discussion: 

      Give an idea about the aspect of SMS1 function that is being affected. Even if no further experiments were carried out, the authors could discuss possibilities. One might speculate what the PS is being used for. For example, is it a co-factor for integral membrane proteins, such as flippases? Is it a co-factor for peripheral membrane proteins, such as yet more LTPs? The model could include the work of Peretti et al (2008), which linked Nir2 activity exchanging PI:PA (Yadav et al, 2015) to the eventual function of CERT. Could the PS have a role in removing/reducing DAG produced by CERT? 

      We thank the reviewer for their recommendations. The same recommendations were also scripted in the public review, which we believe we answered sufficiently. 

      Other, Minor: 

      Make clear that there is no sterol readout (Fig 1C) 

      We would like to point out that Figure 1C has a sterol readout as CE refers to cholesterol esters.

      PH domains of ORP9 and ORP11 localized only partially to the Golgi, unlike the PH domains of OSBP and CERT" (line 154). Say here where the non-Golgi ORP9 and ORP11 PH domain pool is - presumably in the cytoplasm.  

      We thank the reviewer for their suggestion and rephrase the sentence accordingly. 

      Fig 7H-J: histograms not lines as these are separate unlinked categories

      We thank the reviewer for their suggestion. However, we think the original figure represent our findings in the best possible way. Our analysis regarding individual lipid species is also included in Supplementary figure 10.

      Reviewer #3 (Recommendations For The Authors): 

      (1) At the end of the intro, in summarizing their findings, the authors state (p3. lines 48-49) "These findings highlight how phospholipid and sphingolipid gradients along the secretory pathway are linked at ER-Golgi membrane contact sites." This should instead read "These findings highlight THAT phospholipid and sphingolipid gradients along the secretory pathway are linked at ER-Golgi membrane contact sites." 

      We thank the reviewer for their suggestion and change the sentence accordingly.

      (2) As noted in the public section, to show that ORP9/11 do indeed exchange lipids, an in vitro experiment demonstrating that ORP11 can transfer PI4P is essential. Ideally, it would be best to examine PS AND PI4P transfer by ORP9 AND 11 separately AND then by the ORP9/11 heterodimer. This could lend insights as to the function of the heterodimer. The He et al et Yu paper should provide guidelines for this. Why have the heterodimers? 

      We believe we addressed this point by showing the lipid transfer ability of the ORP9-ORP11 dimer. These findings are now part of the revised manuscript.

      (3) It would be interesting to discuss the roles of ORP9/ORP11 versus ORP9/ORP10... they seem so analogous, although this is at the discretion of the authors. 

      We thank the reviewer for their suggestion. Since the difference between ORP9-ORP10 and ORP9-ORP11 dimers was also raised by other reviewers, we decided to include this discussion in the manuscript. A section based on our answer to Reviewer #2 in Public Review is now part of the Discussions.

      (4) The authors used a melanoma cell line in their screens (p3, line 59). Could they explain why they used this cell line versus others? 

      We chose MelJuSo cell for various reasons. Mainly, MelJuSo are diploid, which eases generating knockouts in a screening setup compared to other polyploid cancer cell lines (e.g. HeLa). Furthermore, our CRISPR/Cas9 screening protocols are optimized for these cell lines.

    1. eLife assessment

      This work presents fundamental new insights into the conductivity of freshwater cable bacteria. The evidence supporting the conclusions, which was collected using appropriate techniques, is compelling. The work will be of interest to environmental microbiologists and the microbial electrochemistry community.

    2. Reviewer #2 (Public Review):

      Summary:

      In this work, Mohamed Y. El-Naggar and co-workers present a detailed electronic characterization of cable bacteria from Southern California freshwater sediments. The cable bacteria could be reliably enriched in laboratory incubations, and subsequent TEM characterization and 16S rRNA gene phylogeny demonstrated their belonging to the genus Candidatus Electronema. Atomic force microscopy and two-point probe resistance measurements were then used to map out the characteristics of the conductive nature, followed by microelectrode four-probe measurements to quantify the conductivity.

      Interestingly, the authors observe that some freshwater cable bacteria filaments displayed a higher degree of robustness upon oxygen exposure than what was previously reported for marine cable bacteria. Finally, a single nanofiber conductivity on the order of 0.1 S/cm is calculated, which matches the expected electron current densities linking electrogenic sulphur oxidation to oxygen reduction in sediment and is consistent with hopping transport.

      Strengths and weaknesses:

      A comprehensive study is applied to characterise the conductive properties of the sampled freshwater cable bacteria. Electrostatic force microscopy and conductive atomic force microscopy provide direct evidence of the location of conductive structures. Four-probe microelectrode devices are used to quantify the filament resistance, which presents a significant advantage over commonly used two-probe measurements that include contributions from contact resistances. While the methodology is convincing, I find that some of the conclusions seem to be drawn on very limited sample sizes, which display widely different behaviour. In particular:

      The authors observe that the conductivity of freshwater filaments may be less sensitive to oxygen exposure than previously observed for marine filaments. This is indeed the case for an interdigitated array microelectrode experiment (presented in Figure 5) and for a conductive atomic force microscopy experiment (described in line 391), but the opposite is observed in another experiment (Figure S1). It is therefore difficult to assess the validity of the conclusion until sufficient experimental replications are presented.

      The calculation of a single nanofiber conductivity is based on experiment and calculation with significant uncertainty. E.g. for the number of nanofibres in a single filament that varies depending on the filament size (Frontiers in microbiology, 2018, 9: 3044.), and the measured CB resistance, which does not scale well with inner probe separation (Figure 5). A more rigorous consideration of these uncertainties is required.

      Comments on revised version:

      The authors address all of the comments carefully.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work provides significant insight into freshwater cable bacteria (CB) and is an important contribution to the emerging CB literature. In this manuscript, Yang et al. describe currentvoltage measurements on CB collected from two freshwater sources in Southern California. The studies use electrostatic and conductive atomic force microscopies, as well as four-probe measurements. These measurements are consistent with back-of-the-envelope calculations on conductivities needed to sustain CB function. The data shows that freshwater CB have a similar structure and function to the more studied marine cable bacteria.

      Strengths:

      Excellent measurements on a new class of cable bacteria.

      Weaknesses:

      The paper would benefit from additional analysis of the data.

      Reviewer #1 (Recommendations for The Authors):

      This work provides significant insight into freshwater cable bacteria (CB) and is an important contribution to the emerging CB literature. In this manuscript, Yang et al. describe current-voltage measurements on CB collected from two freshwater sources in Southern California. The studies use electrostatic and conductive atomic force microscopies, as well as four-probe measurements. These measurements are consistent with back-of-the-envelope calculations on conductivities needed to sustain CB function. The data shows that freshwater CB have a similar structure and function to the more studied marine cable bacteria. Minor comments follow.

      We are grateful to the reviewer for the encouraging feedback and for appreciating the central message of the preprint. Below we address the reviewer’s constructive comments.

      Additional information could be provided regarding the degraded cells where an 'empty cage' remains, as well as the polyphosphate granules, which were previously observed in marine CB (refs. 11 and 18). 

      We have edited the manuscript to note that the appearance of empty cages and the polyphosphate granules in freshwater cable bacteria is indeed consistent with these features as previously reported in marine CB. The size of polyphosphate granules in freshwater CB are comparable or slightly smaller than in marine CB (Sulu-Gambari et al., 2015). In the case of empty cages, these cells were previously described as ‘ghost filaments’ which had lost all cell membrane and cytoplasmic material (Cornelissen et al., 2018). 

      Manuscript edits: a sentence regarding polyphosphate granules has been added into the manuscript from lines 307 - 308. “The size of polyphosphate granules in freshwater CB (70 nm – 400 nm) is comparable or slightly smaller than in marine CB (35)”.

      A sentence regarding the empty cages has been added into the manuscript (lines 303-305). “These empty cages were previously described as ‘ghost filaments’ which had lost all cell membrane and cytoplasm material (20).”

      The authors also state that the 'phase difference between the elevated ridges and interridge regions is proportional to the tip voltage squared,' and refer to Fig. 4D. This figure has only three data points with large error bars. The authors may wish to explain this finding and justify their analysis in greater detail.

      We thank the reviewer for pointing out that we presented this result but did not adequately describe its origin or significance. In general, the probe phase response of electrostatic force microscopy (EFM) can originate not only from the electrostatic interaction with the sample (i.e. the electrical properties of interest) but also from shorter range van der Waals forces (which are more reflective of probe-sample distance i.e. topography). To ensure that EFM is reporting electrical interactions, we performed these measurements using a two-pass technique, with the second pass retracing the topography measured during the first pass, but at a fixed height above the surface where the interactions are long range (electrostatic) rather than short range (vdW) or resulting from topography cross-talk. The purpose of the voltage change measurement (Fig. 4D) is to simply assess whether this procedure is successful, since electrostatic forces are proportional to the square of the voltage at a fixed height (F = ½ . ∂C⁄∂z .V2). While the error bar of that measurement is high, due to the intrinsic noise in the dynamic (high frequency) EFM phase response measurement, we note that the purpose of this measurement is simply to assess that the interaction is due to the electrical interaction with the sample, before proceeding to actual conductance measurements (Figs. 5-8).

      Manuscript edits: we previously simply cited a reference where the reader can delve deeper into the origin of the square voltage signal. To put this into better context, we now include an additional information (lines 461 - 475), noting the origin and purpose of the result as described above.  

      It is interesting that the freshwater CB appear to be more resilient to air compared to marine CB (or at least some freshwater filaments, as the authors note that the level of resilience is filament-dependent). The authors indicate that salt affects oxygen solubility and there is a larger oxygen content in freshwater. Do the authors have thoughts on whether or not the differences between marine and freshwater CB could fit, or not fit, with the hypothesis that conductivity in air is lowered due to oxidation of the Ni/S species (ref. 25 in manuscript)? Could the freshwater CB have greater protection against oxidation?

      We thank the reviewer for highlighting this point. Indeed, our manuscript mentions the current hypothesis that conductivity of cable bacteria may be diminished upon oxidation of the Ni/S groups (lines 101 - 105 and 498 - 504). It remains unclear how this idea may lead to variability between marine and freshwater cables. Interestingly, however, a recent comparative bioRxiv preprint (Digel et. al. 2023) noted significant differences in the morphology, number, and crosssectional area of nanofibers between a freshwater and marine CB strain. These differences may lead to a different resiliency against oxidative degradation upon exposure air. Specifically, even though the marine CB strain was characterized by a larger cross-section area per nanofiber, it had significantly fewer nanofibers, leading to 40% smaller total area than its freshwater counterpart. We have edited the manuscript to highlight these possible differences (at least in size) between freshwater and marine cables.

      Manuscript edits (lines 506 – 514) “For example, a recent comparative study (21) hints at significant differences in the morphology, number, and size of nanofibers when comparing a marine CB strain to a freshwater CB strain. Specifically, while the marine CB was characterized by a 50% larger cross-sectional area per nanofiber, the total nanofibers’ area was 40% smaller than the freshwater strain due to a smaller number of nanofibers per CB filament. Given the proposed central role of nanofibers in mediating electron transport along CB, it is possible that such differences may also lead to different degrees of tolerance against oxidative degradation upon exposure to air.”

      Figure 6D shows current-voltage measurements from three representative cables; there is a large variation, most notably between Cable 1 and Cables 2 and 3. Is this variation typical for different cables? Can the authors comment on the range of values observed and how many cables fit into different ranges? Any thoughts on the reasons behind the range?

      Figure 6 B and C (red and blue) are representative of most of the cable conductance measured using the point IV CAFM technique, with the Figure 6 A (green) IV curve being an example of the upper limit, which was less frequently observed. In total we measured ten cables using the point IV CAFM technique. These variations may stem from actual differences in the conductivity of separate CB filaments, the environment of the measurement, or limitations in the conductive AFM measurement techniques. These limitations include a large contact resistance due to the interaction of the small probe with the sample, which may lead to large variability depending on the contact point.  For this reason, we rely on 4-probe measurements (Fig. 8) for quantitative conductive analyses, rather than conductive AFM. It is important to note, however, that the conductive AFM measurements (Fig. 6 and Fig. 7) provide other complementary information including the demonstration of both transverse and longitudinal transport (lines 389-393) in Fig. 6 and the visualizing of the current carrying nanofibers in Fig. 7. 

      Manuscript edits: we have edited the manuscript (lines 413 - 418) to make it clear that the quantitative estimate of conductivity was made only using 4 probe measurements due to the limitations of CAFM or two-probe techniques.

      Can the authors comment on how the number of fibers per CB in their samples compares with the number of fibers in marine CB? Marine CB are known to have pinwheel junctions where the fibers come together before branching out again. This pinwheel design could play a role in the function of the CB or in its survival (see Adv. Biosys. 2020, 4, 2000006). Were pinwheel structures observed in freshwater CB? If so, how do they compare?

      From the previous studies, estimates of the number of fibers in marine CB appeared to vary significantly from 15 or 17 (Pfeffer et. al., 2012) to 58 – 61 (Cornelissen et. al., 2018). In our freshwater CB, we estimated the number of fibers at ~35 per CB (line 423), which is comparable to the count of 34 per freshwater CB recently reported by Digel et al., bioRxiv 2023. We cannot specifically comment on the pinwheel structure as we did not perform the transverse thin section TEM imaging necessary to observe the cell-cell junctions in this particular study.

      On lines 95-96, the authors discuss the fact that marine cable bacteria have a wide variance in their measured conductivities. While one may ask if the larger marine conductivities (near 80 S/cm) are representative, a conductivity of 0.1 S/cm is 2 orders of magnitude lower than this value, which the field generally refers to as a high conductivity. The authors should mention whether or not any of their specimens display the high conductivities seen in select marine cable bacteria specimens.

      It is indeed important to note that the ~80 S/cm figure refers to an upper end previously observed (ref. 22) for marine CB conductivity. In our manuscript (lines 525 - 526), we highlight that the previously observed range (including in that same study) is 10−2-101 S/cm and we were careful to qualify the previously reported upper end with ‘reaching as high as’ (line 97). Note that this places our measurement of 0.1 S/cm within the previously reported range. We have not observed freshwater CB conductivity near the upper end of the previously reported range, and generally propose that these types of measurements are better analyzed in the context of the biological function rather than ‘high vs. low’. Towards that end, the manuscript (lines 527-537) makes the argument that the 10-1 S/cm figure may be sufficient to support the electrical currents mediated by CB in sediments. We have edited the manuscript to highlight that we did not observe single CB nanofiber conductivity near the upper limit previously observed in marine CB (lines 522 525). 

      Reviewer #2 (Public Review):

      Summary:

      In this work, Mohamed Y. El-Naggar and co-workers present a detailed electronic characterization of cable bacteria from Southern California freshwater sediments. The cable bacteria could be reliably enriched in laboratory incubations, and subsequent TEM characterization and 16S rRNA gene phylogeny demonstrated their belonging to the genus Candidatus Electronema. Atomic force microscopy and two-point probe resistance measurements were then used to map out the characteristics of the conductive nature, followed by microelectrode four-probe measurements to quantify the conductivity.

      Interestingly, the authors observe that some freshwater cable bacteria filaments displayed a higher degree of robustness upon oxygen exposure than what was previously reported for marine cable bacteria. Finally, a single nanofiber conductivity on the order of 0.1 S/cm is calculated, which matches the expected electron current densities linking electrogenic sulphur oxidation to oxygen reduction in sediment. This is consistent with hopping transport.

      Strengths and weaknesses:

      A comprehensive study is applied to characterize the conductive properties of the sampled freshwater cable bacteria. Electrostatic force microscopy and conductive atomic force microscopy provide direct evidence of the location of conductive structures. Four-probe microelectrode devices are used to quantify the filament resistance, which presents a significant advantage over commonly used two-probe measurements that include contributions from contact resistances. While the methodology is convincing, I find that some of the conclusions seem to be drawn on very limited sample sizes, which display widely different behavior. In particular:

      The authors observe that the conductivity of freshwater filaments may be less sensitive to oxygen exposure than previously observed for marine filaments. This is indeed the case for an interdigitated array microelectrode experiment (presented in Figure 5) and for a conductive atomic force microscopy experiment (described in line 391), but the opposite is observed in another experiment (Figure S1). It is therefore difficult to assess the validity of the conclusion until sufficient experimental replications are presented.

      We indeed acknowledge both in the abstract (line 23-26) and section 2.2 (lines 374-377) the variable nature of the sensitivity and filament-dependent response to air exposure. Our discussion (lines 498-506) considers the possible reasons for this variability:

      ‘While these observations showed a high degree of variability and therefore require a more detailed investigation, it is interesting to consider the possibility that the oxidative decline (or other damaging processes), thought to be a consequence of oxidation of Ni cofactors involved in electron transport (25), may not affect all sections of the cm long CB filaments simultaneously; under these conditions, IDA measurements, which probe multiple micrometer-scale electrode-crossing CB regions (e.g. 372 crossings in Figure 5 inset) may offer an advantage over techniques addressing entire CBs or specific CB regions. It is also interesting to consider an alternative possibility that the conductive properties of freshwater CB maybe intrinsically more oxygen-resistant than marine CB’.

      To summarize , the manuscript points to the likelihood that the IDA technique used here may offer an advantage for detecting currents under damaging conditions since it interrogates multiple sections simultaneously. Furthermore, in a recent preprint from Digel et al., (2023), the conductivity of the only freshwater strain investigated in that study was among the highest compared to other marine CB strains. Therefore, the freshwater CB being more resistant is one possibility to be investigated based on these observations and results. We therefore present the latter as a possibility in the discussion.

      The calculation of a single nanofiber conductivity is based on experiment and calculation with significant uncertainty. E.g. for the number of nanofibers in a single filament that varies depending on the filament size (Frontiers in microbiology, 2018, 9: 3044.), and the measured CB resistance, which does not scale well with inner probe separation (Figure 5). A more rigorous consideration of these uncertainties is required.

      The reviewer raises an important point. For these calculations, we made sure to determine the representative number of fibers per cable and thickness of the nanofibers (~50 nm) from our own samples. We indeed assessed the possible variability across our different cable filaments and found the fiber numbers varied from 30 – 44 (with 35 used as a representative figure in the paper). For the scaling of resistance with inner probe separation, our 4P results estimated that the CB resistances are 47 MΩ  and 240 MΩ for the 20 µm and 200 µm lengths, respectively, rather than an expected tenfold difference if the cable has a uniform conductivity along the entire filaments. This result suggests nonuniform conductivity in different sections of the CB filament. Since accounting for non-uniform conduction (and variability in fiber morphology/density) is clearly difficult, we were careful to limit our conclusion to an order of magnitude estimate (e.g. lines 522-525). Given the previously reported range of cable bacteria conductivity (10−2101 S/cm), this places our estimate within this range. We have further edited the manuscript to note that our reported single nanofiber conductivity cannot be constrained further than the order of 0.1 S/cm due to our estimates in nanofiber diameter and per cable amount as well as the possibility of nonuniform conductivity along the CB length (lines 522-525).

      Reviewer #2 (Recommendations for The Authors):

      Figure 4A: Please add scale- and color bar.

      Done - new Fig. 4 included with colors bars for topography and phase. The inset of Fig. 4A denotes a 200 nm scale bar (and that scale is now mentioned in the figure caption)

      Figure 5: A time series graph might be more instructive.

      Done - we indeed appreciate this suggestion and find that it improved the clarity of Figure 5. An inset has been included in Figure 5 plotting the resistance R change over time under different conditions. This inset demonstrates that the resistance of the cable on the IDA was slowly decreasing in the N2/H2 anaerobic chamber, only to start increasing upon exposure to ambient air.

      After putting the cable back into the chamber, the resistance again decreased over time.

    1. Author Response:

      We thank the reviewers for their insightful feedback. In our revised version of the manuscript, we will address all points raised.

      Regarding the preprocessing (Reviewer 1), we agree that the StandardRat pipeline is optimal for newly acquired datasets. However, since this study involves reanalyzing an already published dataset (Ionescu et al., JNM, 2023), which was preprocessed, analyzed, and published before the StandardRat paper, we aimed to maintain the same preprocessing. This approach allows for consistent interpretation of the readout regarding functional and molecular connectivity in the context of our previously published findings. Nonetheless, we agree that providing full access to the data will enable other researchers to reproduce our results using the StandardRat preprocessing pipeline and perform additional analyses on this rich dataset. Therefore, we will provide full access to the data via an open repository, as the reviewer suggested.

      Regarding anesthesia, we acknowledge that this is a limitation of our study, as more recent studies have indicated superior protocols. However, we and others have shown that, while not ideal, isoflurane at the used dose maintains stable physiology and does not cause burst suppression in rats. We will amend our discussion to reflect these points.

      Regarding the other points, we will amend the manuscript to provide more detail on the experimental design, including the tracer application as suggested by Reviewer 2, and clarify parts of the analysis that are unclear in the current version. Additionally, we agree with Reviewer 2 that our current terminology may cause confusion, and we will amend it accordingly. We will also discuss the other points raised by the reviewers, such as the reduced sample size for the pharmacological cohort as limitations in our discussion.

      Thank you for your understanding and the opportunity to improve our manuscript.

    1. Reviewer #1 (Public Review):

      Little is known about the local circuit mechanisms in the preoptic area (POA) that regulate body temperature. This carefully executed study investigates the role of GABAergic interneurons in the POA that express neurotensin (NTS). The principal finding is that GABA-release from these cells inhibits neighboring neurons, including warm-activated PACAP neurons, thereby promoting hyperthermia, whereas NTS released from these cells has the opposite effect, causing a delayed activation and hypothermia. This is shown through an elegant series of experiments that include slice recordings alongside matched in vivo functional manipulations. The roles of the two neurotransmitters are distinguished using a cell-type-specific knockout of Vgat as well as pharmacology to block GABA and NTS receptors. Overall, this is an excellent study that is noteworthy for revealing local circuit mechanisms in the POA that control body temperature and also for highlighting how amino acid neurotransmitters and neuropeptides released from the same cell can have opposing physiologic effects.

    1. eLife assessment

      Through cellular, developmental, and physiological analysis, this valuable study identifies a gene that functions to regulate the relative growth of roots and shoots under salt stress. The holistic approach taken provides solid evidence that this gene, a member of a larger tandemly duplicated gene family initially highlighted by association mapping, as well as an upstream regulator contribute to salt tolerance. More robust statistical or biological support for some conclusions could further strengthen this manuscript. The manuscript will be of interest to plant biologists studying mechanisms of abiotic stress tolerance and gene family evolution.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Strengths:

      The holistic approach and integrative methodologies presented in the manuscript are essential for gaining a mechanistic understanding of a complex trait such as salt tolerance. The authors focused on At3g50160 but included in their analyses additional DUF247 paralogs, which further contributes to the strength of their approach. In addition, the authors considered the developmental stage (young seedlings, early or late vegetative stages) and growth conditions of the plants (agar plates or soil) when investigating the role of SR3G in salt tolerance and root or shoot development.

      Weaknesses:

      The authors' claims and interpretation of the results are not fully supported by the data and analyses. In several cases, the authors report differences that are not statistically significant (e.g., Figures 4A, 7C, 8B, S14, S16B, S17C), use inappropriate statistical tests (e.g., t-test instead of Dunnett Test/ANOVA as in Figures 10B-C, S19-23), present standard errors that do not seem to be consistent with the post-hoc Tukey HSD Test (e.g., Figures 4, 9B-C, S16B), or lack controls (e.g., Figure 5C-E, staining of the truncated versions with FM4-64 is missing).

      In other cases, traits of root system architecture and expression patterns are inconsistent between different assays despite similar growth conditions (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B), or T-DNA insertion alleles of WRKY75 that are claimed to be loss-of-function show comparable expression of WRKY75 as WT plants. Additionally, several supplemental figures are mislabeled (Figures S6-9), and some figure panels are missing (e.g., Figures S16C and S17E).

      Consequently, the authors' decisions regarding subsequent functional assays, as well as major conclusions about gene function, including SR3G function in root system architecture, involvement in root suberization, and regulation of cellular damage are incomplete.

    3. Reviewer #2 (Public Review):

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity, and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study that demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      The abstract and beginning of the Discussion section highlight the "new tool" developed here for measuring biomass accumulation. I feel that this distracts from the central aims of the study, which is really about the role of a specific gene in root development under salt stress. I would suggest moving the tool description to less prominent parts of the manuscript.

    1. eLife assessment

      This useful study presents a real-time transcriptomics analysis, with the aim of providing rapid access to sequenced data to reduce the costs associated with Oxford Nanopore long-read technology. Although the authors illustrate the compelling utility of this approach with three diverse experimental setups, issues with study design and analysis result in incomplete supporting evidence.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors developed three case studies: (1) transcriptome profiling of two human cell cultures (HEK293 and HeLa), (2) identification of experimentally enriched transcripts in cell culture (RiboMinus and RiboPlus treatments), and (3) identification of experimentally manipulated genes in yeast strains (gene knockouts or strains transformed with plasmids containing the deleted gene for overexpression). Sequencing was performed using the Oxford Nanopore Technologies (ONT), the only technology that allows for real-time analysis. The real-time transcriptomic analysis was performed using NanopoReaTA, a recent toolbox for comparative transcriptional analyses of Nanopore-seq data, developed by the group (Wierczeiko and Pastore et al. 2023). The authors aimed to show the use of the tool developed by them in data generated by ONT, evidencing the versatility of the tool and the possibility of cost reduction since the sequencing by ONT can be stopped at any time since enough data were collected.

      Strengths:

      Given that Oxford Nanopore Technologies offers real-time sequencing, it is extremely useful to develop tools that allow real-time data analysis in parallel with data generation. The authors demonstrated that this strategy is possible for both human cell lines and yeasts in the case studies presented. It is a useful strategy for the scientific community and it has the potential to be integrated into clinical applications for rapid and cost-effective quality checks in specific experiments such as overexpression of genes.

      Weaknesses:

      In relation to the RNA-Seq analyses, for a proper statistical analysis, a greater number of replicates should have been performed. The experiments were conducted with a minimal number of replicates (2 replicates for case study 1 and 2 and 3 replicates for case study 3).

      Regarding the experimental part, some problems were observed in the conversion to double-stranded and loading for Nanopore-Seq, which were detailed in Supplementary Material 2. This fact is probably reflected in the results where a reduction in the overall sequencing throughput and detected gene number for HEK293 compared to HeLa were observed (data presented in Supplementary Figure 2). It is necessary to use similar quantities of RNA/cDNA since the sequencing occurs in real-time. The authors should have standardized the experimental conditions to proceed with the sequencing and perform the analyses.

    3. Reviewer #2 (Public Review):

      Summary:

      Transcriptomics technologies play important roles in biological studies. Technologies based on second-generation sequencing, such as mRNA-seq, face some serious obstacles, including isoform analysis, due to short read length. Third-generation sequencing technologies perfectly solve these problems by having long reads, but they are much more expensive. The authors presented a useful real-time strategy to minimize the cost of sequencing with Oxford Nanopore Technologies (ONT). The authors performed three sets of experiments to illustrate the utility of the real-time strategy. However, due to the problems in experimental design and analysis, their aims are not completely achieved. If the authors can significantly improve the experiments and analysis, the strategy they proposed will guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way and help studies in both basic research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequenced with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that the sequencing can be stopped once enough data is collected to save cost. Here, they described three sets of experiments: a comparison between two human cell lines, a comparison among RNA preparation procedures, and a comparison between genetically modified yeasts. Their results show that the real-time strategy works for different species and different RNA preparation methods.

      Weaknesses:

      However, especially considering that the computational tool NanopoReaTA is their previous work, the authors should present more helpful guidelines to perform real-time ONT analysis and more advanced analysis methods. There are four major weaknesses:

      (1) For all three sets of experiments, the authors focused on sample clustering and gene-level differential expression analysis (DEA), and only did little analysis on isoform level and even nothing in any figures in the main text. Sample clustering and gene-level DEA can be easily and well done using mRNA-seq at a much cheaper cost. Even for initial data quality checking, mRNA-seq can be first done in Illumina MiSeq/NextSeq which is quick, before deep sequencing in HiSeq/NovaSeq. The real power of third-generation RNA sequencing is the isoform analysis due to the long read length. At least for now, PacBio Iso-seq is very expensive and one cannot analyze the data in real-time. Thus, the authors should focus on the real-time isoform analysis of ONT to show the advantages.

      (2) The sample sizes are too small in all three sets of experiments: only two for sets 1 and 2, and three for set 3. For DEA, three is the minimal number for proper statistics. But a sample size of three always leads to very poor power. Nowadays, a proper transcriptomics study usually has a larger sample size. Besides the power issue, biological samples always contain many outliers due to many reasons. It is crucial to show whether the real-time analysis also works for larger sample sizes, such as 10, i.e., 20 samples in total. Will the performance still hold when the sample number is increasing? What is the maximum sample number for an ONT run? If the samples need to be split into multiple runs, how the real-time analysis will be adjusted? These questions are quite useful for researchers who plan to use ONT.

      (3) According to the manuscript, real-time analysis checks the sequencing data in a few time points, this is usually called sequential analysis or interim analysis in statistics which is usually performed in clinical trials to save cost. Care must be taken while performing these analyses, as repeated checks on the data can inflate the type I error rate. Thus, the authors should develop a sequential analysis procedure for real-time RNA sequencing.

      (4) The experimental set 1 (comparison between two completely different human cell lines) and experimental set 2 (comparison among RNA preparation procedures) are not quite biologically meaningful. If it is possible, it is better for the authors to perform an experiment more similar to a real situation for biological discovery. Then the manuscript can attract more researchers to follow its guidelines.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. eLife assessment

      This study develops a useful metric for quantifying codon usage adaptation - the Codon Adaptation Index of Species (CAIS). This metric permits direct comparisons of the strength of selection at the molecular level across species. The study is based on solid evidence, and the authors identify relationships between CAIS and the presence of disordered protein domains. Other correlations, such as the one between CAIS and body size, are weak and non-significant. In summary, the study introduces an interesting new approach to quantifying codon usage across species, which may be helpful in attempts to measure selection at the molecular level.

    2. Reviewer #2 (Public Review):

      Assessment

      This study develops a potentially useful metric for quantifying codon usage adaptation – the Codon Adaptation Index of Species (CAIS) – that is intended to allow for more direct comparisons of the strength of selection at the molecular level across species by controlling for interspecies variation in amino acid usage and GC content. As evidence to support there claim CAIS better controls for GC content and amino acid usage across species, they note that CAIS has only a weak positive correlation with GC% (that does not stand up to multiple hypothesis testing correction) while CAI has a clear negative correlation with GC%. Using CAIS, they find better adapted species have more disordered protein domains; however, excitement about these findings is dampened due to (1) this result is also observed using the effective number of codons (ENC) and

      (2) concerns over the interpretation of CAIS as a proxy for the effectiveness of selection.

      Public Review

      Summary

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that attempts to control for differences in amino acid usage and GC% across species. Using their new metric, the authors observe a positive relationship between CAIS and the overall “disorderedness” of a species protein domains. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection sNe when mutation bias changes across species.

      Strengths

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance.

      (2) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. A significant improvement over the previous version is the implementation of software tool for applying this method.

      (3) The authors do a better job of putting their results in the context of the underlying theory of CAIS compared to the previous version.

      (4) The paper is generally well-written.

      Weaknesses

      (1) The previously observed correlation between CAIS and body size was due to a bug when calculating phylogenetic independent contrasts. I commend the authors for acknowledging this mistake and updating the manuscript accordingly. I feel that the unobserved correlation between CAIS and body size should remain in the final version of the manuscript. Although it is disappointing that it is not statistically significant, the corrected results are consistent with previous findings (Kessler and Dean 2014).

      (2) I appreciate the authors for providing a more detailed explanation of the theoretical basis model. However, I remain skeptical that shifts in CAIS across species indicates shifts in the strength of selection. I am leaving the math from my previous review here for completeness.

      As in my previous review, let’s take a closer look at the ratio of observed codon frequencies vs. expected codon frequencies under mutation alone, which was previously notated as RSCUS in the original formulation. In this review, I will keep using the RSCUS notation, even though it has been dropped from the updated version. The key point is this is the ratio of observed and expected codon frequencies. If this ratio is 1 for all codons, then CAIS would be 0 based on equation 7 in the manuscript – consistent with the complete absence of selection on codon usage. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.

      I think what the authors are attempting to do is “divide out” the effects of mutation bias (as given by Ei), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represents adaptive codon usage. Consider Gilchrist et al. GBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is

      where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g

      E[Oi,g].

      Let’s re-write the  in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as

      where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias . This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process.

      If we do this, then

      Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=

      (1) Thus, we have recovered the Gilchrist et al. model from the formulation of Ei under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).

      We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).

      This shows that the expected value of RSCUS for a two codon amino acid is expected to increase as the strength of selection ∆η increases, which is desired. Note that ∆η in Gilchrist et al. is formulated in terms of selection against a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If ∆η = 0 (i.e. selection does not favor either codon), then E[RSCUS] = 1. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if sNe (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances.

      Consider our 2-codon amino acid scenario. You can see how changing GC content without changing selection can alter the CAIS values calculated from these two codons. Particularly problematic appears to be cases of extreme mutation biases, where CAIS tends toward 0 even for higher absolute values of the selection parameter. Codon usage for the majority of the genome will be primarily determined by mutation biases,

      with selection being generally strongest in a relatively few highly-expressed genes. Strong enough mutation biases ultimately can overwhelm selection, even in highly-expressed genes, reducing the fraction of sites subject to codon adaptation.

      Peer review image 1.

      Peer review image 2.

      CAIS (Low Expression)

      Peer review image 3.

      CAIS (Average Expression)

      Peer review image 4.

      CAIS (High Expression)

      If we treat the expected codon frequencies as genome-wide frequencies, then we are basically assuming this genome made up entirely of a single 2-codon amino acid with selection on codon usage being uniform across all genes. This is obviously not true, but I think it shows some of the potential limitations of the CAIS approach. Based on these simulations, CAIS seems best employed under specific scenarios. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content around 0.41, so I suspect their results are okay (assuming things like GC-biased gene conversion are not an issue). Outliers in GC content probably are best excluded from the analysis.

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. One potential challenge to CAIS is the non-monotonic changes in codon frequencies observed in some species (again, see Shah and Gilchrist 2011 and Gilchrist et al. 2015).

    3. Author response:

      The following is the authors’ response to the original reviews.

      In addition to our responses to reviewer suggestions below, a minor bug in the calculation of CAIS was brought to our attention by a reader of our preprint. We have corrected this bug and rerun analyses, whose results became slightly stronger as noise was removed. While we were doing that, someone pointed out to us that our equations were almost the same as Kullback-Leibler divergence, which explains why our metric performed so well. We have made the numerically trivial (see before vs. after figure below) mathematical change to use Kullback-Leibler divergence instead, and now have a better story, with a solid basis in information theory, as to why CAIS works.

      Author response image 1.

      Unfortunately, we discovered a second bug that caused our PIC correction code to fail to perform the needed correction for phylogenetic confounding. The previously reported correlation between CAIS (or ENC) with body mass no longer survives PIC-correction. We have therefore removed this analysis from the manuscript. Our story now stands more on the theoretical basis of CAIS and ENC than on the post facto validation than it previously did. We now also present CAIS and ENC on a more equal footing. ENC results are slightly stronger, while CAIS has the complementary advantage of correcting for amino acid frequencies.

      The work involved in these changes, as well as some of the responses to reviews below, justifies changing the second author into a co-first author, and adding an additional coauthor (Hanon McShea) who discovered the second bug.

      Reviewer #1 (Public Review): 

      In this manuscript, the authors propose a new codon adaptation metric, Codon Adaptation Index of Species (CAIS), which they present as an easily obtainable proxy for effective population size. To permit between-species comparisons, they control for both amino acid frequencies and genomic GC content, which distinguishes their approach from existing ones. Having confirmed that CAIS negatively correlates with vertebrate body mass, as would be expected if small-bodied species with larger effective populations experience more efficient selection on codon usage, they then examine the relationship between CAIS and intrinsic structural disorder in proteins. 

      The idea of a robust species-level measure of codon adaptation is interesting. If CAIS is indeed a reliable proxy for the effectiveness of selection, it could be useful to analyze species without reliable life history- or mutation rate data (which will apply to many of the genomes becoming available in the near future). 

      A key question is whether CAIS, in fact, measures adaptation at the codon level. Unfortunately, CAIS is only validated indirectly by confirming a negative correlation with body mass. As a result, the observations about structural disorder are difficult to evaluate. 

      As discussed in the preamble above, we have replaced the body mass validation with a stronger theoretical basis in information theory.

      A potential problem is that differences in GC between species are not independent of life history. Effective population size can drive compositional differences due to the effects of GC-biased gene conversion (gBGC). As noted by Galtier et al. (2018), genomic GC correlates negatively with body mass in mammals and birds. It would therefore be important to examine how gBGC might affect CAIS, and to what extent it could explain the relationship between CAIS and body mass. 

      Suppose that gBGC drives an increase in GC that is most pronounced at 3rd codon positions in highrecombination regions in small-bodied species. In this case, could observed codon usage depart more strongly from expectations calculated from overall genomic GC in small vertebrates compared to large ones? The authors also report that correcting for local intergenic GC was unsuccessful, based on the lack of a significant negative relationship with body mass (Figure 3D). In principle, this could also be consistent with local GC providing a relatively more appropriate baseline in regions with high recombination rates. Considering these scenarios would clarify what exactly CAIS is capturing. 

      Figure 3 (previously Supplementary Figures S5A and S5B) shows that CAIS is negligibly correlated with %GC (not robust to multiple comparisons correction), and ENC not at all. We believe this is evidence against the possibility brought up by the reviewer, i.e. that Ne might affect gBGC (and hence global %GC). This relationship, if present, could act as a confounding effect, but it is not present within our species dataset. 

      Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that non-selective forces, include gBGC as well as conventional mutation biases, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, CAIS and ENC correct for both mutation bias and gBGC, in order to isolate the effects of selection.

      This argument, based on an average genomic region, is vulnerable to gene-rich genomic regions having differentially higher recombination rates and hence GC-biased gene conversion. However, we do not see the expected positive correlation between |𝐥𝐨𝐜𝐚𝐥 𝐆𝐂 - global GC| and CAIS (see new Figure 5), again suggesting that gene conversion strength is not a confounding factor acting on CAIS.

      Given claims about "exquisitely adapted species", the case for using CAIS as a measure of codon adaptation would also be stronger if a relationship with gene expression could be demonstrated. RSCU is expected to be higher in highly expressed genes. Is there any evidence that the equivalent GCcontrolled measure behaves similarly? 

      Correlations with gene expression are outside the scope of the current work, which is focused on producing and exploiting a single value of codon adaptation per species. It is indeed possible that our general approach of using Kullback-Leibler divergence to correct for genomic %GC could be useful in future work investigating differences among genes.  

      The manuscript is overall easy to follow, though some additional context may be helpful for the general reader. A more detailed discussion of how this work compares to the approach taken by Galtier et al. (2018), which accounted for GC content and gBGC when examining codon preferences, would be appropriate, for example. In addition, it would have been useful to mention past work that has attempted to explicitly quantify selection on codon usage. 

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences as a function of species. Our approach might therefore be robust to scenarios where different genes have different codon preferences (see Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne.

      Reviewer #2 (Public Review): 

      ## Summary 

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection $sN_e$ when the mutation bias changes across species.  

      ## Strengths 

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this). 

      We now cite Cope et al. as an example of how amino acid composition can act as a confounding factor.

      (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected. 

      Unfortunately, our previous PIC correction code was buggy, and in fact the relationship with body size does not survive PIC correction (although it is strong prior to PIC correction). We have therefore removed it from the paper. However, the more novel result on protein disorder remains strong.

      (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. 

      ## Weaknesses 

      (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s $S$ statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to $S$, CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences. 

      The main limitation of dos Reis’s test in our view is that, like the better versions of CAI, it requires comparable orthologs across species. See also the discussion below re the benefits of proteome-wide approach. We now also note the advantage of not needing tRNA gene copy numbers and abundances. 

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. the complications of Gingold et al. 2014 cited above are pertinent, but incorporating them would make simulations quite involved. Instead, we now have a stronger theoretical justification for CAIS grounded in information theory. We have significantly expanded discussion of Figure 2 to give a clearer idea of the conceptual underpinnings of CAIS and ENC.

      The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher $N_e$ results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?" 

      Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.

      I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by $E_i$), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is

      where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g

      E[Oi,g].

      Let’s re-write the  in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as

      where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias .This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process. 

      If we do this, then 

      Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=1. Thus, we have recovered the Gilchrist et al. model from the formulation of $E_i$ under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).. 

      We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).

      This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection $\Delta\eta$ increases, which is desired. Note that $\Delta\eta$ in Gilchrist et al. is formulated in terms of selection *against* a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If $\Delta\eta = 0$ (i.e. selection does not favor either codon), then $E[RSCUS] = 1$. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if $sN_e$ (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay. 

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. 

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. While we keep our more heuristic presentation, our revised manuscript now more clearly acknowledges that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. The reason that we believe our approach worked despite this, is that we think the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene. We have made multiple changes to the texts to make this point clearer.

      Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method. 

      Genome-wide %GC values are hard-coded because they were taken from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The more complicated code used to calculate the intergenic %GC, and the code used to calculate amino acid frequencies is located at https://github.com/MaselLab/CodonAdaptation-Index-of-Species. Luckily, someone else just wrote a simpler end to end pipeline for us, on the basis of our preprint. We now note this in the Acknowledgements, and link to it: https://github.com/gavinmdouglas/handy_pop_gen/blob/main/CAIS.py.

    1. eLife assessment

      This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.

    2. Reviewer #1 (Public Review):

      In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system.

      Comments on revised version:

      I have read through the responses and the revised manuscript. I think together this results in an improved version.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.

      Strengths:

      Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.

      Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.

      Comments on revised version:

      The manuscript has been improved after revisions. The methods, data and analyses broadly support the claims with only minor weaknesses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.

      We thank the reviewers and the editor for their effort and expertise in reviewing our manuscript. We have made changes based on the reviews and believe this has greatly strengthened our manuscript. We appreciate their insightful comments and suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system, however falls short in some technical aspects of the bioinformatic analyses of the generated sequence data.

      (1) Four sequencing libraries were generated and then merged for analysis, however, the authors fail to document any parameters that would indicate that the clustering does not suffer from any batch effects.

      We thank the reviewer for this comment which has given us the opportunity to elaborate on this interesting point. Consequently, we have added evidence to show that the data do not suffer from batch effects between samples (e.g. between sorted samples 1 and 4, and unsorted samples 2 and 3). We now show that there are contributions to all clusters from sorted and unsorted samples and highlight the benefits to using both conditions in a cell atlas with unknown cell types.

      Accordingly, we have now added the following paragraph to line 153:

      There were contributions from sorted and unsorted samples in almost all clusters (except ciliary plates). We found that some cell/tissue types had similar recovery from both methods (e.g. Stem A, Muscle 2, and Tegument), others were preferentially recovered by sorting (e.g Neuron 1, Neuron 4, and Stem E), and some were depleted by sorting (e.g. Parenchyma 1, Protonephridia, and Ciliary plates) (Supplementary Figure 1) , Supplementary Table 4). This variation in recovery, therefore, enabled us to maximise the discovery and inclusion of different cell types in the atlas.

      We have now added a Supplementary Figure 1 showing the contribution of sorted and unsorted cells to the Seurat clusters. We have also included a Supplementary Table 4 detailing the cell number contribution for both conditions and the percentages in order to easily compare differential recovery between cell types.

      These are added to the manuscript.

      (2) Additionally, the authors switch between analysis platforms without a clear motivation or explanation of what the fundamental differences between these platforms are. While in theory, any biologically robust observation should be recoverable from any permutation of analysis parameters, it has been recently documented that the two popular analysis platforms (Seurat - R and scanPy python) indeed do things slightly differently and can give different results (https://www.biorxiv.org/content/10.1101/2024.04.04.588111v1). For this reason, I don't think that one can claim that Seurat fails to find clusters resolved by SAM without running a similar pipeline on the cluster alone as was done with SAM/scanPy here. The manuscript itself needs to be checked carefully for misleading statements in this regard.

      We thank the reviewer for this comment and agree that it’s important to increase the clarity on this matter. We have added additional detail to explain that results of subclustering Neuron 1 using Seurat and SAM/ScanPy were broadly similar, but that we presented the results from the SAM/ScanPy analysis due to the strengths of SAM in detecting small differences in gene expression (Tarashanky et al., 2019 PMID: 31524596). We have included here the UMAP showing subclustering of Neuron 1 in Seurat for comparison.

      Author response image 1.

      UMAP showing subclustering of Neuron 1 cluster in Seurat (SCT normalisation, PC = 19, resolution = 0.3).

      We’ve added this additional text to the ‘Neuron abundance and diversity’ section on line 220:

      We explored whether Neuron 1 could be further subdivided into transcriptionally distinct cells by subclustering (Supplementary Figure 2; Supplementary Table 6) using the self-assembling manifold (SAM) algorithm (Tarashansky et al., 2019) with ScanPy (Wolf et al., 2018), given its reported strength in discerning subtle variation in gene expression (Tarashansky et al., 2019), although a similar topology was subsequently found using Seurat.

      (3) Similarly, the manuscript contains many statements regarding clusters being 'connected to', or forming a 'bridge' on the UMAP projection. One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (see Chari and Pachter 2023). To support these types of interpretations, the authors should provide evidence of gene expression transitions that support connectivity as well as stability estimates of such connections under different parameter conditions. Otherwise, these descriptors hold little value and should be dropped and the transcriptomic states simply defined as clusters with no reference to their positions on the UMAP.

      We thank the reviewer for this thoughtful comment. We agree and have rephrased those statements accordingly e.g. line numbers 218, 439, 543, and 557.

      (4) The underlying support for the clusters as transcriptomically unique identities is not well supported by the dot plots provided. The authors used very permissive parameters to generate marker lists, which hampers the identification of highly specific marker genes. This permissive approach can allow for extensive lists of upregulated genes for input into STRING/GO analyses, this is less useful for evaluating the robustness of the cluster states. Running the Seurat::FindAllMarkers with more stringent parameters would give a more selective set of genes to display and thereby increase the confidence in the reader as to the validity of profiles selected as being transcriptomically unique.

      The Reviewer is correct in noting that we used a permissive approach to enable a better understanding of the biology of each cluster, based on analysing enriched functions. However, we disagree about the suitability of the approach for finding markers. First, the permissive approach produced longer candidate lists, but those with the best AUC scores for each cluster are at the top of the list for each cluster. Second, some of the markers with lower expression also revealed interesting biology (e.g. Notum in the muscles). Furthermore, we used filtering on the marker genes lists to increase the minimum marker gene scores for analyses such as the GO analyses (details in the GO section of the methods). It’s important to stress that our approach also utilised validation by FISH for top marker genes, as well as biologically informative genes that were lower down the marker gene list.

      (5) Figure 5B shows a UMAP representation of cell positions with a statement that the clustering disappears. As a visual representation of this phenomenon, the UMAP is a very good tool, however, to make this statement you need to re-cluster your data after the removal of this gene set and demonstrate that the data no longer clusters into A/B and C/D.

      We’ve added Supplementary Figure 13 to show that after removing WSR and ZSR genes and reclustering, the data no longer clusters in A/B and C/D, even at a higher resolution where clusters appear oversplit.

      Also, as a reader, these data beg the question: which genes are removed here? Is there an over-representation of any specific 'types' of genes that could lead to any hypotheses of the function? Perhaps the STRING/GO analyses of this gene set could be informative.

      We have performed GO-enrichment analyses on W-specific genes, Z-specific genes and both together compared to the rest of the genome, but we did not find very informative results (see Supplementary Table 13 that we have now added, line 464). This may be due to the large difference in size. There are approx 900 Z-specific genes (males two copy, females one copy), while approx 30 W-specific genes many of which have homologs in the Z-specific region of the genome. Instead we suggest that tissue-specific regulation of gene dosage compensation is the more likely explanation as reported for other species (Valsecchi et al. 2018).

      (6) How do the proportions of cell types characterized via in situ here compare to the relative proportions of clusters obtained? It does not correspond to the percentages of the clusters captured (although this should be quantified in a similar manner in order to make this comparison direct: 10,686/20,478 = ~50% vs. 7%), how do you interpret this discrepancy? While this is mentioned in the discussion, there is no sufficient postulation as to why you have an overabundance of the stem cells compared to their presence in the tissue. While it is true that you could have a negative selection of some cell types, for example as stated the size of the penetration glands exceeds both that of the 10x capabilities (40uM), and the 30uM filters used in the protocol, this does not really address why over half of the captured cells represent 'stem cells'. A more realistic interpretation would be biological rather than merely technical. For example, while the composition of the muscle cells and the number of muscle transcriptomes captured are quite congruent at ~20%, the organism is composed of more than 50% of neurons, but only 15% of the transcriptomic states are assigned to neuronal. Could it be that a large fraction of the stem cells are actually neural progenitors? Are there other large inconsistencies between the cluster sizes and the fraction of expected cells? Could you look specifically at early transcription factors that are found in the neurons (or other cell types) within the various stem cell populations to help further refine the precursor/cell type relationships?

      Yes, it is really interesting that more than 50% of cells in the animal are neurons whereas more than 50% of cells in scRNAseq data are stem cells. This dataset provides a unique opportunity to compare tissue composition in the whole animal to the corresponding single cell RNAseq dataset.

      The table (in Supplementary Table 17) shows the percentage of cells from each tissue type in the miracidium (identified via in situ hybridisation of tissue-type marker genes) and in the scRNAseq to understand this phenomenon.

      This table shows that the single cell protocol used in this study negatively selected for nerves and tegument, and positively selected for stem and parenchyma. The composition of the muscle and protonephridia cells and the number of muscle and protonephridia transcriptomes captured are quite congruent.

      This technical finding is also biologically consistent. For instance, the tegument cells span the body wall muscles, with the cell bodies below and a syncytial layer above. It is not known how the tegument fragments during the dissociation process, and which parts of the cells get packaged by the 10X GEMs. Because of tegumental structure, the cells are likely prone to damage, and therefore we speculate that is why the tegument cells are under-represented in our 10X data. Unusually shaped fragments may not have been captured in 10X GEMs and of those that were, damaged or distressed tegument cells/fragments may have been excluded post-sequencing, by QC filters including cell calling, mitochondrial percentage and low transcript count (e.g. if there there was a tegumental fragment with 100 transcripts it would have not passed QC). Stem cells are spherical with a large nucleus:cytoplasm ratio, likely making them more robust during dissociation and more likely to be captured in 10X GEMs.

      We don’t think that a large fraction of the stem cells are actually neural progenitors because:

      (1) we used previously reported marker genes of different tissue types to identify the single cell RNAseq clusters, e.g. Ago2-1 for stem cells, which has been used in multiple life stages.

      (2) The stem cell transcriptomes express many previously reported stem cell marker genes.

      (3) We found that the stem cells from the single cell data generally had higher numbers of transcripts than the other cell types which is consistent with the Wang et al. 2013 observation that RNA marker POPO-1 could distinguish germinal (stem) cells from other cell types as they are RNA rich.

      (4) We also found higher numbers of ribosomal related transcripts in our stem cell transcriptomes, which is consistent with Pan’s observation that part of the distinct morphology of stem cells is densely packed ribosomes in the cytoplasm.

      In order to elaborate on this discussion we have generated new visualisations:

      (1) A UMAP of the stem cell marker ago2-1 (Supplementary figure 10), to further illustrate our evidence in classifying the stem cell clusters

      (2) A co-expression plot of the stem cell marker ago2-1 with neural marker complexin to confirm that there is little coexpression (the most coexpression being in Neuron 1 and Stem F). We identified that 15.56% of cells in the Stem F cluster show some expression of complexin (neural marker), suggesting that a small fraction of Stem F may be early/precursor neurons, but the gene expression indicates that the majority of cells in Stem F are more likely to be stem cells than any other tissue type. There is little to no complexin expression in the other stem clusters.

      (3) Expression plots of the 5 neurogenins (TFs involved in neuronal differentiation) we could identify using WormBase ParaSite in these data. Four of the five showed very little expression, and not in specific clusters. The fifth (Smp_072470) showed slightly more expression, though still sparse, mostly across the stem and neural clusters not enough to indicate that any of the stem clusters are neural progenitors.

      Author response image 2.

      Coexpression UMAP showing the expression of stem cell marker Ago2-1 and neural marker complexin.

      Author response image 3.

      UMAPs showing the expression five putative neurogenins of S.mansoni.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.

      Strengths:

      Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.

      Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.

      Weaknesses:

      The sample sizes upon which the in situ hybridization results and cell counts are based are either not stated (in most cases) or are very small (n=3). This lack of clarity about biological replicates and sample sizes makes it difficult for the reader to assess the robustness of the results and the extremely small sample sizes (when provided) are a missed opportunity to explore the variability of the system, or lack thereof.

      We have now added more details about the methods we used for validating cell type marker genes by in situ hybridisation. We have added to the methods that ‘We carried out at least three in situ hybridisation experiments for each marker gene we validated (each experiment was a biological replicate). From each experiment we imaged (by confocal microscopy) at least 10 miracidia (technical replicates) per marker gene experiment.’ on line 1036.

      In the figure legends we have added the number of miracidia that were screened, and documented the percentage of the screened larvae that showed the in situ gene expression pattern that is seen in the images in the figures, and that we described in the text.

      We manually segmented the nuclei of pan tissue marker genes, and we did this for one miracidium in the case of all tissues, except stem cells where we segmented stem cells in five larvae. Manual segmentation of gene expression in a confocal z-stack is very time consuming. We consider that the variability of different cell and tissue types (stereotypy) between miracidia is beyond the scope of this paper and can be investigated in future work.

      Although assigning transcripts to a given cell type is usually straightforward via in situ experiments, the authors fail to consider the potential difficulty of assigning the appropriate nuclei to cells with long cytoplasmic extensions, like neurons. In the absence of multiple markers and a better understanding of the nervous system, it seems likely that the authors have overestimated the number of neurons and misassigned other cell types based on their proximity to neural projections.

      This is a valid point, and we acknowledge the difficulties of assigning a nucleus to a cell using mRNA expression only and in the absence of a cell membrane marker. We tried to address this issue by labelling the cell membranes using an antibody against beta catenin after the HCR in situ protocol. This method has been used successfully on sections on slides (Schulte et al., 2024), but we failed to get usable results in our miracidia whole-mounts. The beta catenin localisation marked the membranes of the gland cells but didn’t do the same for the neurons or other cell types (see image below).

      Author response image 4.

      Image showing a maximum intensity projection of a subvolume of a confocal z-stack of a miracidia wholemount in situ hybridisation (by HCR) for paramyosin counterstained with a beta catenin antibody (1:600 concentration of Sigma C2206). The cell membrane of a lateral gland is clearly labelled, but those of the neurons of the brain and the paramyosin+ muscle cells are not.

      Our observation that 57% of the cells in a miracidium are nerves is high compared to the C.elegans hermaphrodite adult in which 302 out of 959 cells are neurons (Hobert et al., 2016), few studies have equivalent data with which to make comparisons. Despite this, and the limitation described above, we believe that we have not overestimated the number of neural cells. During the process of validating the marker genes and closely examining gene expression in hundreds of miracidia, we noted that the nuclei of different tissue types are distinct and recognisable (see figure below). The nuclei of stem, tegument and parenchymal cells are comparatively large and spherical with obvious nucleoli (i). The four nuclei of the apical gland cell are angular, pentagonal in shape and sit adjoining each other (inside red dashed circle, i-iii), those of the two lateral glands are bilaterally symmetrical and surrounded by flask shaped cytoplasm (arrows, iv). The nuclei of the body wall muscle cells are peripheral and flattened on the outer edge (iii). The notum+ muscle cell nuclei are anterior of the apical gland (manuscript Figure 2E). The only other two tissue types are the nerves and protonephridia, and their nuclei are smaller and more compact/condensed. In situ expression of the protonephridia marker suggests that 6 cells make up the protonephridial system (manuscript Figure 4 B&E). Therefore, by process of elimination, the remaining nuclei should belong to neurons. The complexin expression pattern supports this and we counted 209 nuclei that were surrounded by cpx transcript expression. To help the reader interpret this for themselves we have added confocal z-stacks of miracidia where tissue level markers have been multiplexed (supplementary videos 18-20). We counted all tissue type cells individually and the tissue type cell numbers added up to the overall cell count.

      Author response image 5.

      Image showing the diversity of nucleus morphology between tissue types in the miracidium.

      Biologically, it is not surprising that this larva is dominated by neural cells. It must navigate a complex aquatic environment and identify a suitable mollusc host in less than 12 hours. It is a non-feeding vehicle that must deliver the stem cells to a suitable environment where they can develop into the subsequent life cycle stage. Accordingly, the cell type composition reflects this challenge.

      The conclusion that germline genes are expressed in the miracidia stem cells seems greatly overstated in the absence of any follow-up validation. The expression scales for genes like eled and boule are more than 3 orders of magnitude smaller than those used for any of the robustly expressed genes presented throughout the paper. These scales are undefined, so it isn't entirely clear what they represent, but neither of these genes is detected at levels remotely high (or statistically significant) enough to survive filters for cluster-defining genes.

      Given that germ cells often develop early in embryogenesis and arrest the cell cycle until later in development, and that these transcripts reveal no unspliced forms, it seems plausible that the authors are detecting some maternally supplied transcripts that have yet to be completely degraded.

      We agree that the expression of genes such as eled and boule are low. We made this clear in the figure legends and text, and have now added scale information to the figure legends. We did not explore these genes as cluster-defining genes, partly due to their comparatively low levels of expression, but as genes already reported to be important in germ line specification. We found the expression of these genes to be consistent with our hypothesis that the Kappa stem cells may include germ line segregated cells, but our hypothesis does not rest on these lower-expressed genes.

      It is certainly possible that we have detected some maternally supplied transcripts in the miracidia stem cells. However experiments to distinguish between zygotic and maternal transcripts using metabolic labelling of zygotic transcripts (e.g. Fishman et al. 2023) would be hard in this species due to the hard egg capsule and its ectolethical embryogenesis. Therefore this is out of scope for this work, but this would be a very interesting topic to follow up on and develop tools for.

      We have added these sentences to the Discussion ln 746 ‘Intriguingly, the presence of spliced-only copies of the germline defining genes eled and boule could suggest that they are maternal transcripts that have been restricted to the primordial germ cells during embryogenesis, as is the case in Zebrafish embryos (Fishman et al., 2023). An alternative explanation is that unspliced transcripts exist for these lowly expressed genes but their abundance was below our threshold for detection.’

      Reviewer #1 (Recommendations For The Authors):

      Ln 138: specify the version of Seurat used, and reference the primary papers for this software. Also, from the dot plot shown here, these do not all appear to be supported by unique gene sets. How was the final clustering determined? This information is in the methods section, but a summary here could make it more robust for the readership.

      In addition to the details in the methods section, we have added the version and referenced the version-specific primary paper for Seurat when it is first mentioned. We have also summarised the methods used to select the final clustering when we first present the results to aid in clarity.

      We added to line 140 ‘Using Seurat (version 4.3.0) (Hao et al., 2021), 19 distinct clusters of cells were identified, along with putative marker genes best able to discriminate between the populations (Figure 1C & D and Supplementary Table 2 and 3). We used Seurat’s JackStraw and ElbowPlot, along with molecular cross-validation to select the number of principal components, and Seurat’s clustree to select a resolution where clusters were stable (Hao et al., 2021).’

      Ln 147: isn't seven stem cell clusters a lot? See comment in public review.

      We did not have preconceived expectations of the number of stem cell clusters, and were guided by the data and gene expression. In doing so we also discovered that four of those clusters were likely only two ‘biologically or functionally distinct’ clusters, but these split into four clusters based on the expression of genes on the sex-specific regions of the chromosomes, which was both unexpected and interesting.

      Figure 1D: gene model names are un-informative for the general reader. Can you provide any putative gene identities here to render this plot interpretable? For example in the main text you state that Smp-085540 is paramyosin; please use this annotation in all your visual material (as is used in Figure 2A).

      We have added gene names to the dotplots in all figures with the locus identifier (minus the ‘Smp’ prefix) in brackets after the gene name.

      Ln 191:196 Identification of the two muscle clusters as circular and longitudinal muscles is very well supported. However, it would be interesting to look specifically at the genes that are different here. Did the authors attempt to specifically pull out genes differentially expressed between these two groups, or only examine the output of FindAllMarkers at this point?

      We did indeed look specifically for genes differentially expressed between the muscle clusters, the results of which can be found in Supplementary Table 5 (Line 206). This analysis revealed “Wnt-11-1 (circular) and MyoD (longitudinal) were among the most differentially expressed genes”, which were important findings in our understanding of the muscle cells in the miracidium.

      Ln 207: "connected to stem F" - does this refer specifically to their relative positions on the UMAP in Figure 1C? One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (public review).

      We agree, and have rephrased accordingly.

      Ln 209:211: Here the authors switch from Seurat (R) as an analysis package, to SAM (python) for subset analysis of one large neural cluster. The results indicate that there may be small populations of transcriptomically distinct neural subtypes also within the neural1 cluster, but that the vast majority of these cells do not express unique transcriptomic profiles. Also in the supplementary material for this (SF1) there is a question of whether or not there is any clustering according to batch effects.

      In general, I find the neuronal section a little difficult to follow and it is unclear how many unique profiles are present and which are documented with in situ. I would recommend re-running the analysis on the entire neural subset (n1:5: complexin positive) and generating an inventory of putatively unique neural states with the associated in situ validation altogether in a main figure.

      In response to comments above we have both clarified our reasoning for using SAM analysis, and presented more details on possible batch effects. We have gone through the neural system results in order to make it clearer for the reader to follow.

      Ln 236: here the authors introduce a STRING analysis for the first time. Also, this method requires some introduction for the general audience in terms of its goals and general functionality and output.

      We used STRING analysis on some well defined clusters to provide additional clues about function. At the first mention of STRING (neuron 3 results) we have added the following statement to give more introduction to the reader: “STRING analysis of the top 100 markers of Neuron 3 predicted two protein interaction networks with functional enrichment: ….”

      Ln. 280:281. It is unclear why Steger et al is referenced here. In what way does a description of neural and glandular cell transcriptomic similarity in a Cnidarian inform your data on a member of the playhelmenthes? (which should also be referenced in the introduction: to which phylogenetic lineage does Schistosoma belong).

      We have now added that the Schistosoma belong to the Platyhelminths on the first line of the introduction.

      Ln 295 we have added ‘We expected to find a discrete cluster(s) for the penetration glands, and that it would show similarities to the neural clusters (as glandular cells arise from neuroglandular precursor cells in other animals, such as the sea anemone, Nematostella vectensis, Steger et al., 2022).’

      Ln 339: explain the motivation for generating a further plate-based scRNA of the ciliary plates.

      We wished to include the ciliary plates alongside the gland cells for plate based RNAseq as they are unique to the miracidium stage and wanted to make sure we had captured them in this study.

      Ln 345: Define the tegumental cells for the general reader.

      We have added further description on tegument cells in the introduction and tegument results section, e.g. on line 61, 366).

      Ln 365: "this cluster" is imprecise. Which cluster are we looking at here?' Also: were flame cells already described morphologically at this stage, or is this the first description of the protonephridial system for this stage of the life cycle?

      We have now clarified which cluster we are talking about in the text. The flame cells have been described using TEM before (Pan, 1980).

      Stem Cells: also here you refer to cells as 'bridge' which refers to the configuration of the UMAP. While this is likely a biological representation of a different differentiation state, the nomination of this based solely on the UMAP representation should be avoided.

      We have rephrased this.

      Figure 5B: What is neuron 6? This was Neuron 3 in Figure 1.

      Thank you for spotting these mistakes in the labelling, we have corrected them now.

      Ln 421:438 - Here you represent a UMAP representation of the cell positions, but state that the clustering disappears. See comment in Public Review.

      Modified accordingly, see response in public review.

      Ln 472 "Cells in stem E, F, and G in silico clusters might be stressed/damaged/dying cells or cells in transcriptionally transitional states." Is there any evidence supporting either of these conclusions?

      We found that 15.56% of the cells in Stem F expressed the neural marker complexin, leading us to consider the possibility that a fraction of these cells may be neural precursors. Stem F also had some cells with a mitochondrial % near the maximum threshold we set, suggesting they could be experiencing some stress. Since we could not identify clear markers for these clusters, their function and a more specific identity, beyond ‘stem’, is not yet known.

      That the two stem cell populations contribute to different parts of the next life cycle stage is interesting. The combined analysis suffers from the same issues as the previous analysis in terms of sample distribution; are the 'grey' sporocyst cells also contributing to the stem A/B (kappa) C/D (delta/phi) clusters? This is not possible to tell from the plot as the miracidia may simply be plotted on the top. A different representation of sample contribution to clusters is warranted.

      We have made an alternative visualisation here to demonstrate that the miracidia cells are not plotted on top of the sporocyst stem cells. Unfortunately this visual is hampered as there is not a straightforward way to split the panels. In the figure below, the left pane shows the miracidia cells, and the right pane shows the sporocyst cells. Below that, we have included the original figure for comparison. It can be clearly seen that there are three miracidia tegument cells in the sporocyst tegument cluster, and one sporocyst cell in the miracidia stem cells (Stem E), but the miracidia A/B and C/D stem cells are not plotted on top of any sporocyst cells.

      Author response image 6.

      Methods: Why is the multiplet rate estimate at >50% for the unsorted sample?

      We have added more detail on this: “The estimated doublet rate was calculated based on 10X loading guidelines and adjusted for our sample concentrations”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more careful consideration of what was already known based on previous literature, which would help the authors to better put their results in context. For example, previous work suggested that one of the sporocyst stem cell populations (phi) gives rise to tegument and other temporary larval structures; this appears not to be mentioned here. The model in Figure 7 suggests that two of the stem cell populations are gone at day 15 post-infection; the literature shows that those cells can still be detected at this stage (there are just far fewer of them).

      We have added the definition of Kappa, Delta and Phi as per Wang et al (2018) in the stem cell results p13 ln 428.

      We have amended Figure 7 to include further elements from the Wang et al (2018) paper that show that mother sporocyst stem cells classified as delta and phi are still detectable on day 15 post-infection in mother sporocysts.

      We intentionally didn’t put too much emphasis on fitting our data to the model of Wang et al (2018), because a) it’s a different life cycle stage and b) the single cell data the model was based on was from 35 stem cells and gathered using a different method, c) more recent data (Diaz, Attenborough et al. 2024) with 119 stem cells from sporocysts did not recover the same populations of stem cells. We therefore linked our data to previous literature where it was relevant but focused on being led by the data we gathered (>10,000 stem cells).

      (2) To add some detail to the public comment about the lack of clarity about sample sizes and biological replicates, and how this leads to questions about the robustness of the results, Figures 4 B and F show the expression pattern for the same parenchyma marker (Smp_318890) in two different samples. The patterns appear quite distinctive. In B, the cell bodies are so clearly labeled that the signal appears oversaturated. In F the cell bodies are barely apparent. Based on the single-cell clustering, it should be possible to distinguish between Parenchyma clusters 1 and 2 based on the levels of this transcript. Careful quantification of signal intensity from multiple samples across multiple experiments might enable the authors to detect such differences.

      The reason the expression patterns look different between panels 4Bii and 4F is that in 4Bii we have manually segmented the nuclei of the parenchymal cells in order to count them, whereas in the images in 4F there is no segmentation. We have made this more clear in this legend now, and also in the legends of Figures 2,3, and 5. If there was any signal intensity difference between parenchyma 1 and 2 cells based on expression of the marker gene, Smp_318890, it was not obvious. We carried out 6 experiments for parenchyma markers, multiplexing the pan-parenchyma marker, Smp_318890, with markers for parenchyma 2 but we were unable to distinguish between the two populations.

      (3) The authors find that the "somatic" stem cells in miracidia seem to combine attributes of the previously defined delta and phi stem cells from sporocysts. Because the 3 classes of sporocyst stem cells were defined by expression of nanos-2 and fgfrA, using those probes in in-situ experiments could have helped them resolve whether or not the miracidial cells represent precursors that can adopt either fate or if the heterogeneity is already present in miracidia.

      In silico expression of the marker genes for the 3 classes of sporocyst stem cells didn’t support those three classes in the miracidia stem cells (See supplementary table 10). We further subclustered the delta/phi cells to see if we could recover separate delta and phi populations but we were unable to do so. We therefore did not pursue in situ experiments of these genes. We instead prioritised cluster-defining genes in the miracidia stem cell populations rather than cluster defining genes in the sporocyst (defined by Wang et al., 2018), but we still explored these in silico. For example, instead of using klf to define Kappa (Wang et al 2018), we used UPPA to validate the Kappa population as it showed similar expression to klf but higher expression levels and was specific to that population. However, like Wang et al 2018, we did use p53, which is a cluster marker of delta and phi in sporocysts, as it showed clear and high expression in our miracidia delta/phi population. We were guided by our data and our knowledge of the literature. More in depth single cell RNAseq is needed from the mother and daughter sporocyst stages to understand the heterogeneity and fates of these stem populations.

      (4) Scale bars should be included throughout the figures and the scale should be defined either on the figure or in the legend. Similarly, all the scales used for velocity and expression analysis should be defined.

      We have added scale bars to all figures and legends.

      The statements “Gene expression has been log-normalised and scaled using Seurat(v. 4.3.0)”, “Gene expression has been normalised (CPM) and log-transformed using scvelo(v. 0.2.4)”, or “Library size was normalised and gene expression values were log-normalised using SAM (v1.0.1) and Scanpy (v1.8.2)” has been added to all figures as appropriate.

      (5) The table entitled In situ hybridization probes (Supplementary Table 15) contains no probe sequences, so any interested reader wishing to use these probes would have to design their own. To ensure the reproducibility of the results presented here, the authors should provide the probe sequences they used.

      In Supplementary Table 15 we have added the Molecular Instruments Lot number of all the probes used. Anyone wanting to repeat the experiment can order the same probes from the company.

      (6) It is unclear how useful the supplemental figures showing the STRING enrichment analyses will be for readers. Unannotated Smp gene identifiers provide no way to help readers digest the information in these hairballs. It would probably be best to replace the Smp names with useful annotations based on their orthologs; if not, these figures could probably be dropped entirely. (Also, the bottom panel of Supplementary Figure 7 has the word "Lorem" embedded on one of the connecting nodes.)

      “Lorem” has been removed.

      Many of the genes in these analyses do not have short descriptions, therefore we have used Smp gene identifiers in the STRING analysis supplementary figures. These ‘Smp_’ numbers can be used to search WormBase Parasite, where a description can be found and the history of the gene ID traced. This latter function facilitates searching for these genes in the literature and consistency between versions as gene models are updated.

      Minor edits

      (1) Figures 4A-D aren't cited in the text until after 4E-F are. It seems like moving the section on protonephridial cells (line 364) before the section on tegumental cells (line 345) better reflects the order of the figures.

      Thank you for flagging this, we have updated the in-text citations of Figure 4.

      (2) In-text references to Sarfati et al, 2021 should be to Nanes Sarfati, as listed in the references. Poteaux et al 2023 is cited in the text, but not in the reference list.

      Both of these have been fixed.

    1. eLife assessment

      This important work provides evidence that glutamate and GABA are released from different synaptic vesicles at supramammillary axon terminals onto granule cells of the dentate gyrus. The study uses complementary electrophysiological and anatomical experimental approaches. Together, these provide solid evidence that the co-release of glutamate and GABA from different vesicles within the same terminal could modulate granule cell firing in a frequency-dependent manner, although thorough elimination of alternative mechanisms would have strengthened the study. The work will be of interest to neuroscientists investigating co-release of neurotransmitters in various synapses in the brain and those interested in subcortical control of hippocampal function.

    2. Reviewer #1 (Public Review):

      This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA-filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum.

      Major concerns:

      (1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a one-transmitter identity and had distinct physiological properties, could that account for some of the physiological findings?

      (2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation.

      (3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper.

      (4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents.

      (5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG.

      (6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why?

      (7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM?

      (8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequency-dependent depression and GABA showing frequency-independent stable depression.

      Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs.

      Strengths:

      The conclusions of this paper are mostly well supported by the data.

      Weaknesses:

      Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Hirai et al investigated the release properties of glutamate/GABA co-transmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs.

      Strengths:

      Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits.

      Weaknesses:

      No major issues are noted. Some minor issues related to data presentation and experimental details are listed below.

    1. eLife assessment

      This study provides a novel and valuable alternative explanation for volatility-induced changes in choice behavior, commonly attributed to learning-rate adaptations. Through rigorous and comprehensive computational modeling of previously published data, the authors provide convincing support for the claim that apparent learning-rate adaptations may instead reflect a mixture of decision strategies. Furthermore, they demonstrate that differential weighting of the optimal decision strategy is predicted by psychopathology common to depression and anxiety. This work should be of interest to a wide range of scientists, including psychologists, neuroscientists, computer scientists, and clinicians.